Chaos Engineering in Production DevOps: Designing Safe Experiments to Improve System Resilience
Abstract
Chaos engineering is increasingly used to validate resilience assumptions, yet production experimentation requires careful safety boundaries. This research proposes a chaos engineering program design that combines hypothesis-driven experiments, blast-radius constraints, automated rollback, and reliability scoring aligned with service objectives. The study evaluates fault injection scenarios including dependency degradation, network partitions, and resource exhaustion. Results indicate improved confidence in failover mechanisms and reduced incident severity through validated resilience improvements.
Cite this article
(2025). Chaos Engineering in Production DevOps: Designing Safe Experiments to Improve System Resilience. Research Explorations in Global Knowledge & Technology (REGKT), 4 (3). Retrieved from https://regkt.com/article.php?id=786&slug=chaos-engineering-production-devops-designing-safe-experiments-improve-system-resilience