Detecting and Fixing Flaky Playwright Tests

A test that fails one run in fifty is invisible to a single CI pass, so the first job is to make the flakiness reproduce on demand. Once you can trigger it reliably, the fix is almost always mechanical: replace a hard waitForTimeout() with a web-first assertion that waits on the exact condition the test cares about. This page gives a numbered, repeatable procedure for catching intermittent failures with --repeat-each, reading the flaky markers CI emits, and converting timing-based waits into deterministic ones. It sits under Flaky Test Management, part of the wider Debugging & Test Observability guide.

A fixed wait races the network and breaks when the response is slow; a web-first assertion polls until the condition holds, so it passes regardless of latency.

Detect flakiness deliberately

A flaky test will not reveal itself in one run, so reproduce it under load. The --repeat-each flag runs every selected test N times in a single invocation, multiplying your chances of hitting the bad timing window:

import { test, expect } from '@playwright/test';

test('add to cart updates the badge', async ({ page }) => {
  await page.goto('/cart');
  await page.getByRole('button', { name: 'Add to cart' }).click();
  await expect(page.getByTestId('cart-count')).toHaveText('1');
});

Run it fifty times against the suspect spec and watch for a single red:

npx playwright test cart.spec.ts --repeat-each=50 --workers=4

If even one of the fifty fails, the test is flaky. Running with several workers also surfaces shared-state collisions that a single worker would never expose.

Detect flakiness in CI

CI catches the flakiness you cannot reproduce locally. When retries is set in playwright.config.ts, Playwright reruns a failed test and, if a later attempt passes, marks the result flaky rather than passed or failed. The HTML reporter lists every flaky test with the attempt that failed and the attempt that recovered, which is your queue of bugs to fix. Treat a non-zero flaky count as a build smell even when the job is green.

Web-first assertions versus hard waits

The root of most Playwright flakiness is a fixed-duration wait. page.waitForTimeout(200) is a bet that the application settles within 200ms — a bet you lose whenever CI is slow or the network is cold. Web-first assertions (expect(locator).toHaveText(), toBeVisible(), toBeEnabled(), and the rest) retry automatically until the condition is true or the timeout expires, so they adapt to whatever speed the run happens to have. The fix for nearly every flaky test is to delete the timeout and assert on the real end state instead.

Step-by-step fix

Reproduce the flake locally. Run the suspect spec with npx playwright test <file> --repeat-each=50 --workers=4. If nothing fails, raise the repeat count or add workers until at least one run goes red, confirming the test is genuinely flaky and not a one-off infrastructure blip.
Capture a trace of the failure. Run with --trace on so the failed attempt records a full action timeline, then open it with npx playwright show-trace to see the exact action that flipped and whether the application or the assertion ran too early.
Find the hard wait or premature read. Search the spec for waitForTimeout, sleep, and any value read with textContent() or innerText() immediately after an action. These synchronous reads run before async state settles and are the usual culprits.
Replace it with a web-first assertion. Swap the timed wait for await expect(locator).toHaveText(...), toBeVisible(), or toHaveCount(). The assertion polls until the expectation holds, removing the race without guessing a duration.
Isolate any shared state. If the flake only appears with multiple workers, the test depends on data another test mutates. Provision unique data per test through a fixture and never rely on a clean global database.
Re-run the repeat loop to confirm. Execute --repeat-each=50 again. A fix is proven only when the full loop passes; if a failure remains, return to the trace, because a second cause is still present.

Verification

Confirm the fix three ways. First, the --repeat-each=50 --workers=4 loop passes with zero failures across the whole batch. Second, the HTML report shows a flaky count of zero for that spec across several CI runs, not just one green build. Third, open the trace from a passing run in the Playwright Trace Viewer and confirm the assertion now resolves after the application reaches its end state rather than racing it. To lock the gains in, set the right retry and timeout values described in Configuring Retries and Timeouts for Stable CI.

Frequently Asked Questions

How many times should I repeat a test to call it stable?

Fifty repetitions with multiple workers is a practical baseline for most suites, and it catches the common one-in-thirty flake. For tests that touch shared infrastructure or run rarely, raise the count to a few hundred, since a rarer race needs more attempts to surface reliably.

Does --repeat-each run tests in parallel?

It runs the repeated copies across your configured workers like any other tests, so combining --repeat-each with --workers exposes both timing races and shared-state collisions at once. Use a single worker only when you specifically want to isolate timing from concurrency effects.

Why are web-first assertions better than waitForTimeout?

A web-first assertion polls until the condition it checks becomes true or the timeout expires, so it waits exactly as long as the application needs and no longer. A fixed waitForTimeout is a blind guess that is too short on slow runs and wastes time on fast ones, which is why it is the leading cause of flaky tests.