Flaky Test Triage Playbook
Flaky tests are not just annoying. Once the team stops trusting failures, the whole CI pipeline loses its ability to protect production.
First classify the failure
Not every flaky test has the same cause. Common classes include:
- timing and async race conditions
- shared test data contamination
- environment instability
- weak selectors or brittle assertions
Classification matters because the fix strategy is different for each one.
Triage before you “fix”
When a flaky test appears, answer these questions first:
- how often does it fail
- does it block deploys
- is the underlying product behavior also unstable
- can the failure be isolated to one environment or browser
That prevents teams from spending a day on a low-value symptom while a more dangerous flaky path remains in CI.
Contain blast radius quickly
If a flaky test is high-noise and low-signal, contain it:
- quarantine it with visibility
- lower its release-gate weight temporarily
- assign an owner and due date
The mistake is leaving it unowned while everyone silently clicks rerun.
Fix the root cause, not the retry count
Retries are sometimes useful for diagnosis, but repeated retries should not become the permanent solution. Strong fixes usually involve:
- deterministic test data
- explicit readiness checks
- narrower assertions
- removing hidden cross-test state
Track flakiness as an operational metric
Measure:
- top failing tests
- rerun frequency
- quarantine age
- build time lost to retries
A reliable pipeline is built by treating test trust as a product, not as a side effect.
Continue Reading
Related posts
Defining a Release Candidate Test Cutline
Running more tests is not the same as shipping safely. This guide explains how to define a release-candidate cutline around real risk.
🧪 TestTesting Learning Path: Beginner to Advanced
A structured testing roadmap from unit test basics to contract boundaries, flaky-test control, and production-grade quality strategy.
📱 MobileRunning a Mobile Crash Budget
Mobile stability is not only about reducing crashes. It is also about deciding which level is acceptable and when release should stop.
🤖 AI / LLMOpsAI Evaluation Rubric for Production Teams
A practical way to define quality rubrics, failure classes, and release gates for production AI features.
Next Path