CI/CD Integration

Q: How many workers and shards should I use?

Set workers to match the CPU of a single runner, two to four on a typical two-core CI machine, and add shards to cut wall-clock time across machines. Total concurrency is workers multiplied by shards, so three shards of four workers gives twelve-way parallelism.

A Playwright suite that passes on a laptop is only half-finished; the suite that matters runs unattended on every pull request, on clean machines, with no display attached, and returns a verdict fast enough that nobody is tempted to bypass it. Continuous integration changes the constraints. Browsers must be installed and cached so each job does not re-download hundreds of megabytes. Execution must be headless. The wall-clock time of the slowest job — not the total CPU time — gates the merge, so the work has to spread across workers within a machine and shards across machines. And when something fails, the pipeline must hand back a trace, a screenshot, and a video, because nobody can attach a debugger to a job that already exited. This guide covers how to take a green local suite and make it a reliable, fast, observable gate in any CI system, building on Playwright Setup & Core Architecture.

The slowest shard, not the total test time, decides how long the gate takes; merging blob reports turns parallel jobs back into a single pass-or-fail verdict.

Headless execution is the CI default

Playwright runs headless unless told otherwise, which is exactly what a CI runner needs because there is no display server attached. The first thing that breaks when teams move from a laptop to a pipeline is missing operating-system libraries that the browser binaries depend on — fonts, graphics, and audio shared objects that a desktop already has but a minimal CI image does not. Install both the browsers and their OS dependencies in one command so the runner is provisioned correctly.

import { defineConfig, devices } from '@playwright/test';

// One config, two behaviors: locals get headed-friendly defaults,
// CI gets the strict settings a pipeline needs.
export default defineConfig({
  // Fail the build if a test was accidentally left with test.only.
  forbidOnly: !!process.env.CI,
  // Retry only in CI, where transient infra noise is real.
  retries: process.env.CI ? 2 : 0,
  // Pin worker count in CI for predictable timing; auto-detect locally.
  workers: process.env.CI ? 4 : undefined,
  use: {
    baseURL: process.env.BASE_URL ?? 'http://localhost:3000',
    // Keep a trace whenever the first attempt fails — the forensic record.
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
  },
  projects: [{ name: 'chromium', use: { ...devices['Desktop Chrome'] } }],
});

The CI environment variable is set automatically by virtually every provider, so a single config branches cleanly between local and pipeline behavior. The forbidOnly flag is a small but high-value guard: it turns an accidentally committed test.only into a failed build instead of a suite that silently runs one test. These settings extend the patterns in Playwright Config & Fixtures, where the base configuration object and fixture lifecycle are defined.

Caching browser binaries

Downloading Chromium, Firefox, and WebKit on every run wastes minutes and bandwidth. Playwright stores binaries in a versioned cache directory, so the cache key must include the installed Playwright version — otherwise a version bump silently reuses stale browsers. Key the cache on the resolved version (read it from package-lock.json or the installed package) and restore it before the install step. When the cache hits, npx playwright install becomes a no-op verification; when it misses, the install repopulates the cache for the next run. If you maintain multiple engines, weigh the cache size against run frequency, and consult Cross-Browser Execution to decide which engines actually need to run on every commit versus on a nightly schedule.

Parallel workers within a machine

Inside a single job, Playwright runs test files in parallel across worker processes. Each worker is an isolated process with its own browser instance, so tests in different files cannot share state by accident. The right worker count is bounded by the runner's CPU and memory: oversubscribing causes the browsers to contend for cores, which inflates per-test latency and manufactures timeouts that look like flakiness. A typical two-core CI runner is healthy at two to four workers. Pin the number with workers in the config (as above) rather than letting auto-detection guess on a runner whose reported core count may be misleading.

import { test, expect } from '@playwright/test';

// Mark a single file as serial when its tests genuinely depend on order,
// so the parallel default does not corrupt shared, sequential state.
test.describe.configure({ mode: 'serial' });

test('step one seeds the account', async ({ page }) => {
  await page.goto('/onboarding');
  await expect(page.getByRole('heading', { name: 'Welcome' })).toBeVisible();
});

test('step two builds on the seeded account', async ({ page }) => {
  await page.goto('/dashboard');
  await expect(page.getByRole('row')).toHaveCount(2);
});

Reserve serial mode for the rare file whose tests are inherently ordered; everything else should stay independent so parallelism stays free.

Sharding across machines

Workers parallelize within one runner, but the slowest single runner still gates the merge. Sharding splits the entire test set across several machines that run at the same time, each executing a slice with --shard=index/total. Three shards of four workers each gives you twelve-way concurrency, and the gate finishes in roughly a third of the single-machine wall-clock time. The catch is that each shard produces its own partial result, so a naive setup yields three separate reports and no single verdict. The fix is the blob reporter: each shard writes a machine-readable blob, and a final job downloads every blob and runs merge-reports to produce one HTML report and one exit code. The full mechanics, including the matrix definition and the merge job, are covered in Running Playwright Tests in GitHub Actions with Sharding.

Artifacts on failure

A failed CI test that leaves nothing behind forces a re-run with more logging, which doubles the feedback loop and may not reproduce the failure at all. Configure the runner to emit a trace on first retry, a screenshot only on failure, and a video retained on failure, then upload those artifacts as part of the job. The trace is the highest-value artifact: it is a self-contained, time-travel recording of the DOM, network, and console that opens in the Trace Viewer. How to capture, store, and read these is the subject of Reporters & Test Artifacts and, more narrowly, Capturing Screenshots and Video on Test Failure. Upload artifacts with if: always() (or the provider's equivalent) so they survive even when the test step fails, which is precisely when you need them.

Flake handling in pipelines

CI amplifies flakiness because it runs more often, on busier hardware, against shared environments. Two retries in CI absorb genuinely transient noise — a slow cold start, a momentary network blip — without masking real regressions, because Playwright reports the test as flaky when a retry rescues a first-attempt failure. Treat that "flaky" verdict as a defect to triage, not a pass to ignore: a test that needs a retry is telling you about a missing wait or a non-deterministic dependency. Surface the flaky count in the merged report, and route persistent offenders to Flaky Test Management for root-cause work. The two most common fixes — deterministic synchronization and bounded retries — are detailed in Detecting and Fixing Flaky Playwright Tests and Configuring Retries and Timeouts for Stable CI.

Containerizing the runner

The most reproducible CI environment is a container built on the official Playwright image, which ships the browsers and every OS dependency already installed and version-matched. This removes the entire class of missing-library failures and makes the same image usable locally, in CI, and in a self-hosted runner. The Dockerfile, the --ipc=host requirement that prevents Chromium from crashing under memory pressure, and the user-permission pitfalls are walked through in Dockerizing Playwright for Headless CI.

Frequently Asked Questions

How many workers and shards should I use?

Set workers to match the CPU of a single runner — two to four on a typical two-core CI machine — and add shards to cut wall-clock time across machines. Total concurrency is workers multiplied by shards, so three shards of four workers gives twelve-way parallelism. Add shards until the per-job overhead of installing browsers stops being worth the time saved.

Why do my browsers re-download on every CI run?

The browser cache key does not match between runs, so the restore step misses every time. Key the cache on the resolved Playwright version so a version bump invalidates the cache deliberately and an unchanged version reuses the binaries. After a cache hit, the install command only verifies the binaries instead of downloading them.

Should CI retries be allowed at all?

Yes, a small bounded number such as two, but only in CI and only as a signal. A retry that rescues a failing test marks it flaky, which is a defect to fix rather than a result to celebrate. Track the flaky count and treat a persistently flaky test as a missing wait or a non-deterministic dependency.