Technical SEO · Glossary · Updated Apr 2026

Crawl trap

Definition

A crawl trap is any URL pattern that generates an unbounded or pathologically large set of URLs. Common sources: infinite calendar archives, faceted navigation combinatorics, session IDs in query strings, and recursive relative-link bugs. Crawl traps drain crawl budget and frustrate indexing of pages that actually matter.

Find related

Long definition

Three traps account for most real-world cases.

Calendar archives. A "next month" link on a blog calendar widget with no end date — Googlebot can crawl 2026, 2027, 2028 indefinitely, finding only empty pages. Common on legacy WordPress installs.

Faceted nav without bounds. Combinatorial filters with no canonical strategy. A 5-facet category page can yield millions of URLs.

Session IDs in URLs. /?sid=abc123 rotated per visit makes every page appear unique. Googlebot crawls hundreds of versions of the same content. Cookies are the modern fix; PHP PHPSESSID= in URLs is the historical disaster.

Recursive relative links. A bug where /foo/bar/ contains a relative link to bar/, generating /foo/bar/bar/, /foo/bar/bar/bar/, and so on. Common in poorly-templated breadcrumb code.

How you find a crawl trap:

  • Log file analysis — look for Googlebot fetching URLs that follow predictable infinite patterns. Sort by URL frequency; the trap appears in the top 100.
  • GSC Coverage — spike in "Crawled, currently not indexed" or "Discovered, currently not indexed" with similar URL stems.
  • Crawl with your own crawler without max-depth and watch the URL count grow without bound.

How you fix:

  • At the source. Remove the unbounded link generation. A calendar widget should stop at the first published post. A breadcrumb should use absolute paths.
  • nofollow on internal links into the trap as a stopgap. Google now treats it as a hint, but it still helps.
  • robots.txt disallow is containment, not a fix. Googlebot still spends capacity on the disallow check.

Common misconceptions

  • "It only matters for huge sites." A 100-page blog with a buggy calendar can spawn 10,000+ trap URLs. Crawl traps scale with the bug, not with site size.
  • "Disallow makes it go away." It stops indexing, not discovery. Trap URLs still appear in Discovered – currently not indexed and waste GSC report space.
  • "My CMS would catch this." Most popular CMSes ship with crawl traps out of the box (calendar plugins, faceted nav modules, parameter explosions). Audit, don't trust.