Short Communication: Speculative Decoding with Dynamic Draft Width

short-communication
Received: Jul 2, 2025
Published: Aug 10, 2025
Authors: Yusuf Demir ✉ Elena Popov

Abstract

We propose a controller that adapts draft width in speculative decoding using entropy and token-level cache hit rates. On multilingual assistants, throughput rose 26�33% with negligible degradation in answer quality.

⬇ Download

Cite this article

Demir, Y. & Popov, E. (2025). Short Communication: Speculative Decoding with Dynamic Draft Width. Research Explorations in Global Knowledge & Technology (REGKT), 3 (4). Retrieved from https://regkt.com/article.php?id=113&slug=short-communication-speculative-decoding-with-dynamic-draft-width

Premium Membership Required

You need a premium account to view or download this article.

Become Premium