GEO & AI Search · Glossary · Updated Apr 2026

AI training opt-out

Definition

AI training opt-out is the bundle of mechanisms that prevent your content from being used to train language and image models: robots.txt blocks for named user-agents, meta `noai`/`noimageai` tags, `X-Robots-Tag` HTTP headers, and — in the EU — text-and-data-mining reservation under DSM Directive Article 4(3).

Find related

Long definition

There is no single switch. AI training opt-out is a layered practice, and each layer addresses a different actor and legal regime.

Layer 1 — robots.txt user-agent blocks. The base case for honor-system actors. Block GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, PerplexityBot, OAI-SearchBot, and any newly-published agents. Publishers like Reuters, The New York Times, and CNN run robots.txt files with 20+ AI-related user-agent blocks each. This works for compliant operators only.

Layer 2 — meta tags. The noai and noimageai meta robots directives, popularized by DeviantArt and adopted by some image hosting platforms, signal "do not train on this content / image." Adoption is uneven. Stable Diffusion's training operators have publicly committed to honoring noai; others are silent. Useful as a belt-and-braces signal alongside robots.txt.

Layer 3 — X-Robots-Tag HTTP header. Same directives as meta robots, delivered at the HTTP level. X-Robots-Tag: noai, noimageai covers non-HTML resources (PDFs, images) where a meta tag isn't possible.

Layer 4 — legal reservation, EU. The Digital Single Market Directive Article 4(3) gives rightholders a TDM (text-and-data-mining) opt-out: a "machine-readable" reservation against commercial mining of works. The Hamburg Regional Court's LAION ruling (September 2024) and follow-up case law have started defining what "machine-readable" means in practice — robots.txt blocks for AI bots and explicit ToS clauses both qualify in current jurisprudence. EU rightholders gain enforceable position; non-EU sites do not.

Layer 5 — paywalls, login walls, and access control. The only mechanism that actually stops a non-compliant scraper. If the data isn't reachable without authentication, it isn't trained on (until credential leaks or contract violations come into play).

For most publishers, Layers 1-3 plus a Terms of Service clause is the working configuration. EU publishers should add Layer 4 by ensuring their reservation is machine-readable and unambiguous. Sites with high-value paid content should consider Layer 5 for the most sensitive material.

Note that none of these layers retroactively remove content from already-trained models. Opt-out affects future cycles only. The window to act is before the next crawl your content appears in.

Common misconceptions

"A robots.txt block legally prevents training." Outside the EU, robots.txt is honor-system only. Inside the EU, it can serve as the "machine-readable" reservation for TDM purposes. The legal weight depends on jurisdiction.
"Meta noai is a recognized standard." It's a community convention with growing but partial adoption. Treat it as a signal, not a guarantee.
"Opt-out removes my content from existing models." It doesn't. GPT-4, Claude 3, Gemini Pro — anything already trained — keeps the content. Opt-out applies to future training runs.
"I only need to block GPTBot." That covers OpenAI's first-party crawler. It doesn't touch CCBot, ClaudeBot, Google-Extended, Bytespider, or the dozen other named agents. A real opt-out lists all of them.

Continue exploring