Chaos Engineering in Production DevOps: Designing Safe Experiments to Improve System Resilience

research-article
Received: Jun 10, 2025
Published: Aug 18, 2025
Authors:

Abstract

Chaos engineering is increasingly used to validate resilience assumptions, yet production experimentation requires careful safety boundaries. This research proposes a chaos engineering program design that combines hypothesis-driven experiments, blast-radius constraints, automated rollback, and reliability scoring aligned with service objectives. The study evaluates fault injection scenarios including dependency degradation, network partitions, and resource exhaustion. Results indicate improved confidence in failover mechanisms and reduced incident severity through validated resilience improvements.

Cite this article

(2025). Chaos Engineering in Production DevOps: Designing Safe Experiments to Improve System Resilience. Research Explorations in Global Knowledge & Technology (REGKT), 4 (3). Retrieved from https://regkt.com/article.php?id=786&slug=chaos-engineering-production-devops-designing-safe-experiments-improve-system-resilience

Premium Membership Required

You need a premium account to view or download this article.

Become Premium