Research Article: Latency SLO Brokers for Multi-Model Inference Gateways
Abstract
We present an SLO broker that routes inference calls across heterogeneous models and accelerators to meet per-tenant P95 targets under cost caps.
Cite this article
Martinez, Q. & Santos, S. (2023). Research Article: Latency SLO Brokers for Multi-Model Inference Gateways. Research Explorations in Global Knowledge & Technology (REGKT), 2 (10). Retrieved from https://regkt.com/article.php?id=566&slug=latency-slo-brokers-multi-model-inference-gateways