Technical SEO · Glossary · Updated Apr 2026

Robots.txt

Definition

Robots.txt is a plain-text file at the root of a host (`/robots.txt`) that tells crawlers which paths they may or may not fetch. It controls crawling, not indexing — a blocked URL can still appear in search results if it has inbound links.

Find related

Long definition

Robots.txt follows the Robots Exclusion Protocol, finally codified in RFC 9309 in 2022 after 25 years of informal usage. It lives at the host root — https://example.com/robots.txt — and is fetched per-host before a crawler starts fetching content.

Core syntax:

User-agent: Googlebot
Disallow: /cart/
Disallow: /search

User-agent: *
Disallow: /admin/
Allow: /admin/public/

Sitemap: https://example.com/sitemap.xml

Rules apply top-down: the first matching User-agent block wins for that crawler, and within it the most specific Allow/Disallow rule wins (longer path = more specific). Disallow: with an empty value means "allow everything", which is the default.

Googlebot respects wildcards (* for any sequence, $ to anchor to path end) but ignores Crawl-delay — only Bing, Yandex, and some smaller crawlers honor it. A 404 or 500 response on /robots.txt is treated as "no restrictions" by Google; a 503 is treated as "crawl the site very slowly and retry the file soon."

Common misconceptions

  • "Robots.txt keeps pages out of the index." It keeps them out of the crawl. A URL with inbound links can be indexed with just a title and a "no description available" note even when Googlebot never fetched it. For actual index removal, use noindex.
  • "Noindex + Disallow is stronger than either alone." It's broken. Disallowed pages aren't crawled, so Google never sees the noindex tag. Pick one: noindex (let Google crawl it to see the directive) or Disallow (accept that the URL might still be indexed with no snippet).
  • "Disallow hides the existence of a path." No — robots.txt is public. /robots.txt is one of the first URLs anyone inspects on your site. If a path's existence is sensitive, authenticate it.
  • "Allow overrides Disallow always." Only when Allow is more specific (longer match). Disallow: /admin/ + Allow: /admin/ ties, and most crawlers break the tie in favor of allowing.