GEO & AI Search · Glossary · Updated Apr 2026

ClaudeBot

Definition

ClaudeBot is Anthropic's web crawler, identified by the user-agent `ClaudeBot`. It's used to gather public content for training Claude models. Block it via robots.txt with `User-agent: ClaudeBot`. Two earlier user-agents, `anthropic-ai` and `Claude-Web`, are deprecated but worth listing for safety.

Find related

Long definition

ClaudeBot is the active crawler Anthropic uses to collect web content for training Claude. It identifies itself with a user-agent string containing ClaudeBot and follows standard robots.txt rules. Anthropic publishes documentation and IP ranges at anthropic.com — verifying log hits against the published ranges is the cleanest way to confirm a request is real.

Blocking it is straightforward:

User-agent: ClaudeBot
Disallow: /

History matters here because Anthropic has changed how it identifies its crawlers. The earlier user-agents anthropic-ai and Claude-Web are deprecated, but defensive operators include all three in a single block group:

User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: Claude-Web
Disallow: /

Like OpenAI, Anthropic distinguishes between crawlers used for training and traffic generated when a user inside Claude.ai requests a web page in conversation. The Claude-User agent, when present, identifies live user-initiated fetches. Blocking ClaudeBot prevents future training use; it does not necessarily block live fetches a user explicitly initiates.

Cloudflare's bot management dashboards show ClaudeBot among the most frequent AI training crawlers across the web, alongside GPTBot and CCBot. Volume is meaningful for large sites — operators have reported 10-100x higher ClaudeBot crawl rates than Googlebot during peak training windows in 2024-2025.

The decision is the same as for GPTBot: if you want to be cited in Claude's answers but not used as training data, only blocking ClaudeBot keeps you in live-retrieval contexts when those exist. If you want full opt-out, block all three legacy and current Anthropic agents and consider also blocking CCBot, since Common Crawl is a known training input.

Common misconceptions

  • "Blocking anthropic-ai is enough." It's deprecated. The active agent is ClaudeBot. A robots.txt that only mentions anthropic-ai does nothing for current crawls.
  • "ClaudeBot ignores robots.txt." Anthropic publicly commits to honoring robots.txt. Reports of ignored rules typically trace back to spoofers using the user-agent string — verify against the published IP ranges before assuming bad behavior.
  • "Blocking ClaudeBot removes my content from existing Claude models." Training data already used cannot be retroactively unlearned. Blocks affect future training cycles only.
  • "All AI bots can be blocked with a single rule." Each crawler has its own user-agent. A wildcard User-agent: * block hits everyone including Googlebot, which is rarely what you want. List AI agents explicitly.