Research Article: Compiler-Guided Quantization for Transformer Inference
Abstract
We present a compiler pass that selects per-layer quantization schemes via profile-guided search, yielding 1.32� speedup with <0.3% accuracy loss across five LLMs.
Cite this article
Rivera, A. (2025). Research Article: Compiler-Guided Quantization for Transformer Inference. Research Explorations in Global Knowledge & Technology (REGKT), 4 (1). Retrieved from https://regkt.com/article.php?id=206&slug=compiler-guided-quantization-transformer-inference