Short Communication: GPU Cache Partitioning for Multi-Tenant Inference
Abstract
Hardware-level cache partitioning improves multi-tenant GPU inference fairness by 9�12% throughput stability.
Cite this article
Janssen, L. & Qureshi, Z. (2023). Short Communication: GPU Cache Partitioning for Multi-Tenant Inference. Research Explorations in Global Knowledge & Technology (REGKT), 2 (8). Retrieved from https://regkt.com/article.php?id=435&slug=short-communication-gpu-cache-partitioning-multi-tenant-inference