Technical SEO · Glossary · Updated Apr 2026

Crawl trap

Definition

A crawl trap is any URL pattern that generates an unbounded or pathologically large set of URLs. Common sources: infinite calendar archives, faceted navigation combinatorics, session IDs in query strings, and recursive relative-link bugs. Crawl traps drain crawl budget and frustrate indexing of pages that actually matter.

Find related

Long definition

Three traps account for most real-world cases.

Calendar archives. A "next month" link on a blog calendar widget with no end date — Googlebot can crawl 2026, 2027, 2028 indefinitely, finding only empty pages. Common on legacy WordPress installs.

Faceted nav without bounds. Combinatorial filters with no canonical strategy. A 5-facet category page can yield millions of URLs.

Session IDs in URLs. /?sid=abc123 rotated per visit makes every page appear unique. Googlebot crawls hundreds of versions of the same content. Cookies are the modern fix; PHP PHPSESSID= in URLs is the historical disaster.

Recursive relative links. A bug where /foo/bar/ contains a relative link to bar/, generating /foo/bar/bar/, /foo/bar/bar/bar/, and so on. Common in poorly-templated breadcrumb code.

How you find a crawl trap:

Log file analysis — look for Googlebot fetching URLs that follow predictable infinite patterns. Sort by URL frequency; the trap appears in the top 100.
GSC Coverage — spike in "Crawled, currently not indexed" or "Discovered, currently not indexed" with similar URL stems.
Crawl with your own crawler without max-depth and watch the URL count grow without bound.

How you fix:

At the source. Remove the unbounded link generation. A calendar widget should stop at the first published post. A breadcrumb should use absolute paths.
nofollow on internal links into the trap as a stopgap. Google now treats it as a hint, but it still helps.
robots.txt disallow is containment, not a fix. Googlebot still spends capacity on the disallow check.

Common misconceptions

"It only matters for huge sites." A 100-page blog with a buggy calendar can spawn 10,000+ trap URLs. Crawl traps scale with the bug, not with site size.
"Disallow makes it go away." It stops indexing, not discovery. Trap URLs still appear in Discovered – currently not indexed and waste GSC report space.
"My CMS would catch this." Most popular CMSes ship with crawl traps out of the box (calendar plugins, faceted nav modules, parameter explosions). Audit, don't trust.

Continue exploring