All topics
Crawling

Agent-aware robots.txt

Tell GPTBot, ClaudeBot and Perplexity what you allow.

Every major AI lab now publishes a named crawler. A modern robots.txt should explicitly grant or deny each one, otherwise you're betting on each company's default behavior (which is usually 'fetch everything').

The bots worth naming

  • GPTBot — OpenAI training crawler.
  • ChatGPT-User — live ChatGPT browsing on a user's behalf.
  • OAI-SearchBot — OpenAI search index.
  • ClaudeBot / Claude-User — Anthropic.
  • PerplexityBot / Perplexity-User — Perplexity AI.
  • Google-Extended — opt out of Gemini training without losing Search.
  • CCBot — Common Crawl (feeds most open models).

Opinionated default

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: *
Disallow: /admin/
Sitemap: https://yoursite.com/sitemap.xml

Disallowing AI bots hides you from the answer layer. If discoverability matters, allow them.