Short Communication: GPU Cache Partitioning for Multi-Tenant Inference

short-communication
Received: Dec 3, 2023
Published: Dec 31, 2023
Authors: Luka Janssen ✉ Ziva Qureshi

Abstract

Hardware-level cache partitioning improves multi-tenant GPU inference fairness by 9�12% throughput stability.

⬇ Download

Cite this article

Janssen, L. & Qureshi, Z. (2023). Short Communication: GPU Cache Partitioning for Multi-Tenant Inference. Research Explorations in Global Knowledge & Technology (REGKT), 2 (8). Retrieved from https://regkt.com/article.php?id=435&slug=short-communication-gpu-cache-partitioning-multi-tenant-inference

Premium Membership Required

You need a premium account to view or download this article.

Become Premium