Anti-Bot Defenses & Rate Limiting

Servers protect themselves from automated traffic for good reasons: a careless scraper can degrade a site for real users, distort analytics, and run up someone else's infrastructure bill. The professional response is not to fight those defenses but to be the kind of client they are happy to serve — one that reads robots.txt, stays under documented rate limits, slows down when asked, and never hammers an endpoint in a tight loop. This guide treats scraping as reliability engineering: how to pace requests, cap concurrency, back off exponentially when a server signals overload, and handle 429 and 503 responses correctly. The goal throughout is respectful, robust automation. It sits under Web Scraping & Data Extraction and is detailed step by step in Handling Rate Limits and Retries When Scraping.

A well-behaved client checks limits before sending and backs off with jitter when the server signals overload.

Read robots.txt and honor its rules

The first request a respectful scraper makes is for robots.txt. It tells you which paths the site owner permits automated agents to fetch and, increasingly, a Crawl-delay that sets a minimum interval between requests. Treat both as binding: skip disallowed paths entirely, and if a crawl delay is present, make it the floor of your pacing. Many sites also publish terms of service and a documented public API; where an API exists, prefer it over scraping the rendered page, because it is the channel the owner built for programmatic access and it usually carries explicit rate-limit headers. Reading and obeying robots.txt costs one request and is the clearest signal that your automation is acting in good faith.

Pace requests deliberately

The single most important habit is to insert a deliberate, bounded delay between requests rather than firing them as fast as the event loop allows. A fixed minimum interval — say one request per second, or whatever the site's crawl delay specifies — keeps your traffic indistinguishable from steady, low-volume use and protects the server's headroom. Pacing also makes runs reproducible: timing-dependent failures vanish when you are not racing the server. Add small randomization to the interval so requests are not robotically periodic, which both smooths bursts and avoids synchronizing with the server's own measurement windows.

import { test } from '@playwright/test';

// A simple token-paced fetch: never send faster than one request per interval.
class Pacer {
  private last = 0;
  constructor(private minIntervalMs: number) {}
  async wait(): Promise<void> {
    const elapsed = Date.now() - this.last;
    const remaining = this.minIntervalMs - elapsed;
    // Sleep only if we are ahead of schedule.
    if (remaining > 0) await new Promise((r) => setTimeout(r, remaining));
    this.last = Date.now();
  }
}

test('paces requests to one per second', async ({ page }) => {
  const pacer = new Pacer(1000); // honor at least a 1s gap
  for (const path of ['/p/1', '/p/2', '/p/3']) {
    await pacer.wait(); // block until the interval has elapsed
    await page.goto(path);
  }
});

Cap concurrency

Parallelism is where good intentions turn into accidental denial of service. Twenty workers each firing without coordination can hit a server with twenty simultaneous requests, far above any reasonable limit. Cap the number of in-flight requests with a small concurrency pool — often two to four for a single host — and let workers wait for a slot before proceeding. When you do parallelize, isolate each worker with its own Browser Contexts & Isolation so sessions and cookies do not bleed across requests, and keep the aggregate rate, not just the per-worker rate, under the documented limit.

import { test, expect } from '@playwright/test';

// Run an array of async tasks with at most `limit` running at once.
async function withConcurrency<T>(
  items: T[],
  limit: number,
  worker: (item: T) => Promise<void>,
): Promise<void> {
  const queue = [...items];
  // Spawn exactly `limit` runners that each pull from the shared queue.
  const runners = Array.from({ length: limit }, async () => {
    let next: T | undefined;
    while ((next = queue.shift()) !== undefined) {
      await worker(next);
    }
  });
  await Promise.all(runners);
}

test('caps in-flight requests at three', async ({ browser }) => {
  const urls = ['/a', '/b', '/c', '/d', '/e'];
  await withConcurrency(urls, 3, async (url) => {
    const ctx = await browser.newContext(); // isolate each worker
    const page = await ctx.newPage();
    await page.goto(url);
    await ctx.close();
  });
  expect(urls.length).toBe(5);
});

Back off when the server signals overload

A 429 (Too Many Requests) or 503 (Service Unavailable) is the server telling you to slow down. The correct response is to wait and retry, not to repeat immediately. If the response includes a Retry-After header, that value is authoritative — wait exactly that long. Otherwise, use exponential backoff: double the wait after each failed attempt, and add jitter so a fleet of clients does not retry in lockstep and create a thundering herd. Cap the number of retries and the maximum delay so a persistently failing endpoint surfaces as an error instead of looping forever. The full implementation — detecting 429, reading Retry-After, exponential backoff with jitter, and throttling — is in Handling Rate Limits and Retries When Scraping.

Treat retries as a reliability discipline

Backoff and retry logic is the same machinery that stabilizes a flaky end-to-end suite, and the mindset transfers directly from Flaky Test Management: distinguish transient failures (429, 503, network blips) that deserve a retry from deterministic failures (404, 403, a parse error) that do not. Retrying a 404 just wastes the server's time and yours. Log every retry with its reason and delay so a run is auditable, and make retries idempotent so a repeated request never double-submits a side effect. A scraper that retries thoughtfully is both kinder to the server and more reliable for you.

Identify your client honestly

Send a descriptive User-Agent that names your automation and a contact address, so a site operator who notices your traffic can reach you rather than block you blind. Combined with honoring robots.txt, conservative pacing, and capped concurrency, an honest identity marks your scraper as a cooperative client. This is the opposite of evasion: the aim is to be legible and easy to work with, which is what keeps long-running data collection sustainable.

Frequently Asked Questions

What should I do when I get an HTTP 429 response?

Stop sending new requests to that host and wait before retrying. If the response carries a Retry-After header, honor it exactly; otherwise wait an exponentially increasing interval with jitter, doubling after each failure up to a cap. Treat the 429 as the server's request to slow down, and reduce your overall pacing for the rest of the run.

How many concurrent requests is it safe to make to one site?

There is no universal number, but a small pool of two to four in-flight requests per host is a conservative default that respects most servers. Always keep your aggregate rate under any limit the site documents in its headers or robots.txt crawl delay, and reduce concurrency immediately if you start seeing 429 or 503 responses.

Does honoring robots.txt and rate limits make scraping reliable as well as polite?

Yes. The same habits that respect a server also stabilize your run: deliberate pacing removes timing-dependent failures, capped concurrency prevents self-inflicted overload, and backoff with jitter recovers cleanly from transient errors instead of compounding them. Respectful automation and robust automation are the same engineering.