Short Communication: Speculative Decoding with Dynamic Draft Width
Abstract
We propose a controller that adapts draft width in speculative decoding using entropy and token-level cache hit rates. On multilingual assistants, throughput rose 26�33% with negligible degradation in answer quality.
Cite this article
Demir, Y. & Popov, E. (2025). Short Communication: Speculative Decoding with Dynamic Draft Width. Research Explorations in Global Knowledge & Technology (REGKT), 3 (4). Retrieved from https://regkt.com/article.php?id=113&slug=short-communication-speculative-decoding-with-dynamic-draft-width