# Control AI crawler access

import { Aside, Steps } from '@astrojs/starlight/components';

You decide which AI systems may use your documentation. The standard lever is `robots.txt`, where you allow or disallow individual crawlers by their user-agent token. This guide gives you a recommended strategy and a copy-paste template.

## Goal

Publish a `robots.txt` that opts your content out of AI model training while keeping it available to the AI search and assistant traffic that sends readers your way.

## Prerequisites

- The ability to serve a file at your site root (`https://docs.example.com/robots.txt`).
- The user-agent tokens you want to target. See [Identify AI bots and crawlers](/guides/ai-traffic/identify-bots/).

## The strategy: block training, allow search and on-demand

AI crawlers fall into categories that the [bot reference](/guides/ai-traffic/identify-bots/) describes in full. For documentation, a common position is:

- **Block training crawlers** — they consume bandwidth to build model datasets and return little direct value (for example, `GPTBot`, `ClaudeBot`, `CCBot`, `Bytespider`, `meta-externalagent`).
- **Block training with opt-out tokens** — `Google-Extended` and `Applebot-Extended` opt you out of Gemini and Apple Intelligence training without affecting search.
- **Allow search indexers and on-demand fetchers** — these surface your docs in AI answers and fetch pages when a user asks a question, which sends qualified readers to you (for example, `OAI-SearchBot`, `Claude-SearchBot`, `PerplexityBot`, `ChatGPT-User`, `Claude-User`).

<Aside type="note">
This is a starting position, not the only valid one. If you want to opt out of AI search as well, disallow the search and on-demand tokens too. If you want maximum reach, allow everything.
</Aside>

## Copy-paste template

Save this as `robots.txt` at your site root, then adjust the lists to match your policy. Anything not listed falls through to your existing default rules.

```text
# Block AI training crawlers
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: CCBot
User-agent: Bytespider
User-agent: meta-externalagent
User-agent: Google-Extended
User-agent: Applebot-Extended
User-agent: cohere-ai
Disallow: /

# Allow AI search indexers and on-demand fetchers
User-agent: OAI-SearchBot
User-agent: Claude-SearchBot
User-agent: PerplexityBot
User-agent: ChatGPT-User
User-agent: Claude-User
Allow: /
```

<Aside type="caution">
`robots.txt` states your preference; it does not enforce it. Well-behaved crawlers honor it, but some ignore it. To enforce a block, combine it with firewall or CDN rules, and verify a crawler is genuine before acting — see [Verify a bot is genuine](/guides/ai-traffic/identify-bots/#verify-a-bot-is-genuine).
</Aside>

## Where to put the file

`robots.txt` must live at your domain root, the same level as `sitemap.xml`. How you publish it depends on your platform:

- **Static sites (Astro, Docusaurus, and others)** — place `robots.txt` in the directory served at the root, such as `public/`.
- **Mintlify** — add the file through your project's static asset configuration.
- **Subdomains are separate** — a docs subdomain (`docs.example.com`) needs its own `robots.txt`; the root domain's file does not apply to it.

## Verify your setup

<Steps>

1. Confirm the file is served at the root:

   ```bash
   curl https://docs.example.com/robots.txt
   ```

2. Confirm your block and allow rules read as you intend, and that the tokens match the current values in the [bot reference](/guides/ai-traffic/identify-bots/).

3. After deploying, [monitor your logs](/guides/ai-traffic/monitor/) to confirm the crawlers you blocked stop appearing — and to catch any that ignore the rules.

</Steps>

## Related

- [Identify AI bots and crawlers](/guides/ai-traffic/identify-bots/) — the token reference these rules depend on.
- [Monitor AI traffic to your docs](/guides/ai-traffic/monitor/) — confirm your rules are working.
- [Serve a Markdown version of every page](/guides/serve-markdown/) — make the content you allow as readable as possible for AI.