Research Article: Latency SLO Brokers for Multi-Model Inference Gateways

research-article
Received: Dec 10, 2023
Published: Dec 31, 2023
Authors: Qian Martinez ✉ Sara Santos

Abstract

We present an SLO broker that routes inference calls across heterogeneous models and accelerators to meet per-tenant P95 targets under cost caps.

⬇ Download

Cite this article

Martinez, Q. & Santos, S. (2023). Research Article: Latency SLO Brokers for Multi-Model Inference Gateways. Research Explorations in Global Knowledge & Technology (REGKT), 2 (10). Retrieved from https://regkt.com/article.php?id=566&slug=latency-slo-brokers-multi-model-inference-gateways

Premium Membership Required

You need a premium account to view or download this article.

Become Premium