X-Robots-Tag
X-Robots-Tag is an HTTP response header that delivers crawler directives at the server level. Functionally equivalent to a meta robots tag but works on any resource, including PDFs, images, and other non-HTML files where meta tags cannot be embedded. Supports the same directives: noindex, nofollow, etc.
Long definition
The meta robots tag works only inside HTML <head>. That leaves PDFs, images, videos, JSON feeds, and any other non-HTML resource without a way to opt out of indexing — until you reach for X-Robots-Tag, the HTTP header version. Google documents both at the robots meta tag reference.
Syntax is identical to meta robots, just delivered in the response header:
X-Robots-Tag: noindex
X-Robots-Tag: noindex, nofollow
X-Robots-Tag: googlebot: noindex, bingbot: nofollow
X-Robots-Tag: noindex, unavailable_after: 25 Jun 2026 15:00:00 GMT
When X-Robots-Tag is the right tool:
- PDFs you don't want indexed — terms-of-service archives, internal docs accidentally exposed, generated reports.
- Images you don't want in image search — proprietary product shots, logos behind a paywall.
- API responses — JSON or XML endpoints that shouldn't appear in search results.
- Bulk directives at the server level — applying
noindexto an entire/staging/path via Nginx config without modifying every page template. - Non-HTML resources with
unavailable_after— scheduled removal of seasonal PDFs, expiring documents.
When meta robots is fine:
- Standard HTML pages where you control the template.
- Per-page directives managed in CMS metadata fields.
- Anything with template-level conditional logic — you already render the meta tag conditionally.
Server configuration examples:
- Nginx:
add_header X-Robots-Tag "noindex" always; - Apache:
Header set X-Robots-Tag "noindex"inside<Files>or<FilesMatch>. - Cloudflare Workers / edge: set the header on response.
- CDN rules: most CDNs support response header injection by URL pattern.
Important: the URL must be crawlable for the directive to be honored. If you Disallow a path in robots.txt and also set X-Robots-Tag: noindex, Googlebot can't fetch the URL to read the header — and the URL may still appear in search results as an "indexed though blocked" entry. Pick one strategy: either let Googlebot crawl and read noindex, or block fully via robots.txt knowing the URL may still surface.
Common misconceptions
- "X-Robots-Tag is more powerful than meta robots." It's equivalent — same directives, same priority. Google and Bing honor both equally. The advantage is only that headers work on non-HTML resources.
- "You can use X-Robots-Tag to noindex pages blocked in robots.txt." No. Blocked URLs are never fetched, so the header is never read. The two mechanisms are mutually exclusive — pick one.
- "X-Robots-Tag is honored by all search engines." Google and Bing yes. Many AI bots, niche crawlers, and image scrapers ignore both meta robots and X-Robots-Tag. For hard exclusion, you need authentication or robots.txt.
- "Setting X-Robots-Tag on every response is a good default." Not unless you mean to deindex everything. Misconfigured global headers (e.g.
noindexon a staging environment that gets promoted to production) have caused real outages. Apply scoped to specific paths or content types.
Continue exploring