Technical SEO · Glossary · Updated Apr 2026

Crawl budget

Definition

Crawl budget is the combination of crawl capacity (how much Googlebot can crawl without overloading a server) and crawl demand (how much it wants to). Under about 10,000 URLs it rarely matters; above that threshold, it decides which pages ever reach the index.

Find related

Long definition

Crawl budget is not a single allocation you spend down. It sits at the intersection of two signals Google controls independently.

Crawl capacity is the ceiling set by your server. Googlebot raises its request rate when responses are fast and 200-OK, and throttles down the moment it sees 5xx errors or slow TTFBs. A site behind a shared CDN with cold caches hits this ceiling quickly. Note that Google itself ignores the Crawl-delay directive in robots.txt — only Bing and Yandex honor it.

Crawl demand is how badly Google wants to recrawl a URL. It rises for pages that change often, have strong inbound link signals, or were recently discovered. It falls for URLs Google has decided are low value, duplicate, or stale.

The product of capacity and demand is what SEO jargon calls "crawl budget". For a site under 10,000 URLs it almost never matters — Google will crawl you faster than you can publish. The signal starts to bite at hundreds of thousands to tens of millions of URLs, or for sites that generate URL combinatorics through faceted navigation, session IDs, or calendar archives.

Common misconceptions

  • "Every site has crawl budget problems." Most don't. Under 10k URLs, your problem is almost always content quality or internal linking, not capacity.
  • "Crawl budget is URLs per day." It's capacity × demand, and both are dynamic. A site might see 50,000 crawls on Monday and 5,000 on Friday with nothing changed on your side.
  • "Blocking in robots.txt saves crawl budget." It prevents fetching the body, but Googlebot still spends capacity on the disallow check and the URL discovery step. For faceted URL noise, fixing the source (rel=canonical, parameter handling, nofollow on internal links) beats a robots.txt bandage.