Crawling
Agent-aware robots.txt
Tell GPTBot, ClaudeBot and Perplexity what you allow.
Every major AI lab now publishes a named crawler. A modern robots.txt should explicitly grant or deny each one, otherwise you're betting on each company's default behavior (which is usually 'fetch everything').
The bots worth naming
- GPTBot — OpenAI training crawler.
- ChatGPT-User — live ChatGPT browsing on a user's behalf.
- OAI-SearchBot — OpenAI search index.
- ClaudeBot / Claude-User — Anthropic.
- PerplexityBot / Perplexity-User — Perplexity AI.
- Google-Extended — opt out of Gemini training without losing Search.
- CCBot — Common Crawl (feeds most open models).
Opinionated default
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: *
Disallow: /admin/
Sitemap: https://yoursite.com/sitemap.xmlDisallowing AI bots hides you from the answer layer. If discoverability matters, allow them.
Keep reading