Research Article: Self-Healing DAGs with Learned Retry Policies
Abstract
We train a policy to adjust retries and backoff using pipeline telemetry. In production, failed task MTTR decreased by 41% and resourcing costs dropped 12%.
Cite this article
Aziz, F. & Sokolov, M. (2025). Research Article: Self-Healing DAGs with Learned Retry Policies. Research Explorations in Global Knowledge & Technology (REGKT), 3 (6). Retrieved from https://regkt.com/article.php?id=127&slug=self-healing-dags-with-learned-retry-policies