Research Article: Multi-Objective Compilers for LLM Serving
Abstract
We co-optimize kernel fusion, quantization, and memory layout using a profile-guided compiler, delivering 1.28� speedup with <0.4% accuracy delta.
Cite this article
Wang, M., Li, O., & Rodriguez, L. (2024). Research Article: Multi-Objective Compilers for LLM Serving. Research Explorations in Global Knowledge & Technology (REGKT), 3 (12). Retrieved from https://regkt.com/article.php?id=311&slug=multi-objective-compilers-llm-serving