Anatomy of a flaky test
· 2 min read
A test in this repo's end-to-end suite started failing: click a card on a kanban board, assert the URL changes to the card's route. The click happened. The URL didn't. Locally it passed every time; under load it failed most of the time. A classic flake — and a nice case study in how the obvious fix can be the wrong one.
The reflex fix#
Flaky click? Retry the click. Playwright even has a blessed idiom for it:
await expect(async () => {
await page.getByRole("link", { name: "Automate the thing" }).click();
await expect(page).toHaveURL(/\/cards\//, { timeout: 3_000 });
}).toPass({ timeout: 30_000 });This made the test worse — it now failed for a full thirty seconds instead of five. That's the first useful signal: a retry that doesn't help means the failure isn't transient. Something was actively undoing the navigation, every time, and the retry was part of the loop.
Read the trace, not the tea leaves#
Playwright records a full trace on failure: every action, every network request, a DOM snapshot at the moment of death. Two facts fell out of it.
First, the click always registered — the trace showed the route's data being fetched, with a 200, every time. So "the click is lost" was dead as a theory. Second, that fetch took four to six seconds on the loaded machine, because the dev server compiles routes on first visit. The original failure was never mysterious: fetch plus render simply took longer than the assertion's five-second budget. The fix the evidence wanted was embarrassingly small — one click, a generous timeout.
But then why did retrying fail for thirty seconds straight?
The retry was closing the thing it opened#
The card opens as a full-screen dialog over the board. Dialogs close on outside interaction — that's correct UI behavior. Now replay the retry loop in slow motion: the first click starts a slow navigation; the loop times out and clicks again; the dialog from the first click mounts at that exact moment; the second click's pointer-down lands outside it; the dialog dutifully closes and navigates back to the board. Repeat until the budget runs out.
The retry wasn't failing to fix the race. The retry was the race. Each attempt manufactured the exact interaction that destroyed the previous attempt's success.
What generalizes#
- A retry that doesn't reduce failures is evidence, not bad luck. It says the failure is being caused, not suffered — go find the feedback loop.
- Traces beat theories. Both wrong hypotheses here ("click is lost", "refresh eats the navigation") died in minutes once the network log was open. Neither would have died from staring at the test code.
- Waiting is underrated. Test machinery that re-drives the UI — retries, re-clicks, re-fills — interacts with real UI behavior like outside-click dismissal in ways single user actions don't. When the action provably happened, the right move is usually to wait longer for its effect, not do it again.
- Flakes are load-shaped. This one never reproduced on an idle machine, reproduced reliably when the test ran alongside everything else. The failing environment is part of the bug report.
The final diff was two lines: one click, thirty-second budget, and a comment explaining why retrying is forbidden there. The investigation is the part worth writing down.