# Understand AI traffic to your docs

import { Aside } from '@astrojs/starlight/components';

AI systems read your documentation. Training crawlers collect it, answer engines cite it, and assistants fetch it in real time to answer user questions. Most documentation teams have little visibility into this activity. This guide explains the kinds of AI traffic you receive and why they matter.

## Why AI traffic matters for documentation

Documentation is high-value text: it is structured, factual, and maintained. That makes it a prime target for AI systems that summarize products, answer technical questions, and generate code.

This affects your team in three ways:

- **Reach.** When an assistant answers a question using your docs, your content reaches users who never visit your site. Accurate, well-structured docs produce accurate answers.
- **Cost.** Aggressive crawlers consume bandwidth and server resources. High-volume bots can affect performance and hosting costs.
- **Control.** You decide which systems may use your content for training versus real-time answers. You cannot make that decision without first seeing the traffic.

## Three categories of AI traffic

Not all AI traffic is the same. Distinguishing these categories tells you what each request is for and how to respond.

### Training crawlers

Training crawlers collect content to build the datasets that train large language models. They visit broadly and repeatedly, similar to search engine crawlers, but the content feeds model training rather than a search index.

Examples include OpenAI's `GPTBot`, Anthropic's `ClaudeBot`, and Common Crawl's `CCBot`. Google and Apple use separate robots.txt tokens (`Google-Extended` and `Applebot-Extended`) to control training use of content their search crawlers already fetch.

### On-demand fetchers

On-demand fetchers retrieve a specific page in real time because a user asked an assistant a question that references it. The request happens at the moment of the conversation, not during bulk crawling.

Examples include OpenAI's `ChatGPT-User`, Anthropic's `Claude-User`, and `Perplexity-User`. This traffic is low-volume and tied directly to user intent, so it is often the most valuable to allow.

### AI search indexers

AI search indexers build the indexes behind answer engines and AI search features. They crawl to keep an index current, which the engine then queries to generate cited answers.

Examples include OpenAI's `OAI-SearchBot`, Anthropic's `Claude-SearchBot`, and `PerplexityBot`.

<Aside type="note">
The same operator often runs separate bots for each category. OpenAI, for example, uses `GPTBot` for training, `OAI-SearchBot` for search indexing, and `ChatGPT-User` for on-demand fetches. This lets you allow one purpose while blocking another.
</Aside>

## A fourth signal: AI referral traffic

Beyond bots, you also receive human visitors who arrive from AI products. When an answer engine cites your documentation and a user clicks through, that visit appears in your web analytics with a referrer such as `chatgpt.com` or `perplexity.ai`.

Referral traffic is not a crawler — it is a real person who found you through an AI answer. Tracking it shows whether AI citations send qualified readers to your docs.

## Monitoring versus controlling

Two distinct activities are often confused:

- **Monitoring** measures who reads your docs and how often. It is read-only and never blocks anyone.
- **Controlling** uses tools such as `robots.txt`, firewall rules, or edge configuration to allow or deny access. See [Control AI crawler access](/guides/control-ai-access/).

`robots.txt` is a control tool, not a monitoring tool — it states your preferences but does not measure or enforce them. Compliance is voluntary, and well-behaved bots honor it while others may not. Always measure actual traffic from your logs rather than assuming your `robots.txt` rules are followed.

## Next steps

- [Identify AI bots and crawlers](/guides/ai-traffic/identify-bots/) — match user-agent tokens to operators and verify them.
- [Monitor AI traffic to your docs](/guides/ai-traffic/monitor/) — measure this traffic with logs and analytics.