# Monitor AI traffic to your docs

import { Aside, Steps } from '@astrojs/starlight/components';

This guide shows you how to measure the AI traffic reaching your documentation. You identify AI crawlers in your access data and track human visitors who arrive from AI answer engines. The techniques are tool-agnostic and work with most hosting and analytics stacks.

## Goal

Produce a repeatable view of:

- Which AI bots fetch your documentation, and how often.
- Which pages they fetch most.
- How many human visitors arrive from AI products such as ChatGPT and Perplexity.

## Prerequisites

- Access to one source of traffic data: raw server or access logs, a CDN or edge dashboard, or a web analytics tool.
- The user-agent tokens you want to track. See [Identify AI bots and crawlers](/guides/ai-traffic/identify-bots/).
- For verification, the ability to check source IP addresses against published ranges or reverse DNS.

## Choose where to measure

Each data source captures a different slice of traffic. Pick the one that matches what you need.

| Source | Captures | Best for |
|--------|----------|----------|
| Server and access logs | Every request, including bots that run no JavaScript | Complete crawler visibility |
| CDN and edge analytics | Requests at the network edge, often with built-in bot labels | Sites behind a CDN that want low-effort dashboards |
| Web analytics (such as GA4) | Human visits that execute JavaScript | AI referral traffic from real users |

<Aside type="caution">
Most AI crawlers do not execute JavaScript, so they never appear in client-side web analytics. To see crawlers, you need server logs or edge data. Use web analytics for human referral traffic, not for bot activity.
</Aside>

## Measure AI crawlers in server logs

Server logs record the user agent and source IP of every request, which makes them the most complete source for crawler activity.

<Steps>

1. Confirm your access logs include the user-agent field. Common web servers log it by default in the combined log format.

2. Filter the logs for the tokens you want to track. For example, to count requests from major AI crawlers in an access log:

   ```bash
   grep -E "GPTBot|ClaudeBot|OAI-SearchBot|PerplexityBot|CCBot|Bytespider" access.log | wc -l
   ```

3. Break the activity down by bot and by page to see what each crawler reads:

   ```bash
   # Requests per AI bot token
   grep -oE "GPTBot|ClaudeBot|OAI-SearchBot|PerplexityBot|CCBot|Bytespider" access.log \
     | sort | uniq -c | sort -rn
   ```

4. Verify high-volume bots are genuine before you act on the numbers. Match the source IP against the operator's published ranges or use forward-confirmed reverse DNS, as described in [Verify a bot is genuine](/guides/ai-traffic/identify-bots/#verify-a-bot-is-genuine).

</Steps>

<Aside type="tip">
For ongoing visibility, ship logs to a log analytics platform and save these filters as a scheduled report or dashboard. A weekly summary surfaces new bots and traffic spikes without manual grepping.
</Aside>

## Find AI traffic on your hosting platform

Your hosting platform sees every request at the edge, including bots that run no JavaScript, so it can surface AI-bot traffic even for a fully static site. Use the view that matches your platform.

### Vercel

Open **Observability → Edge Requests**, which breaks traffic down by individual bot and bot category — including AI crawlers and search engines — on all plans. **Firewall Observability** shows requests by user agent and IP, and the **AI bots managed ruleset** lets you log or deny known AI crawlers from a list Vercel maintains. For raw access logs, stream [Log Drains](https://vercel.com/docs/log-drains) to your own store (paid). See [Bot Management](https://vercel.com/docs/bot-management).

### Netlify

Set up [Log Drains](https://docs.netlify.com/manage/monitoring/log-drains/) to stream server-level request logs to a destination such as Datadog or S3. Netlify tags known AI crawlers (`GPTBot`, `ClaudeBot`, `PerplexityBot`, and others) under an `ai` [user-agent category](https://docs.netlify.com/build/user-agent-categories/) and exposes it on the `Netlify-Agent-Category` request header, so you can filter AI traffic without maintaining your own token list. Because this is server-side, it captures bots that client-side analytics drop.

### Cloudflare

If your site uses Cloudflare as a proxy, [AI Crawl Control](https://developers.cloudflare.com/ai-crawl-control/) (formerly AI Audit) shows which AI services access your content, broken down by provider, bot type (AI data scraper, AI search crawler, archiver), and the sections they fetch — and flags which crawlers honor your `robots.txt`. It is the most detailed AI-traffic view of the three, and you can set allow or block rules from the same screen.

### Another host or CDN

If your platform sees requests at the edge, check its dashboard for a bot or crawler analytics view, and prefer reports that verify bots over ones that trust the user-agent string. Otherwise, fall back to server logs as described in the preceding section.

<Aside type="tip">
These platforms maintain their own AI-bot lists, so you get verified detection without matching user-agent tokens by hand. To verify a bot outside these tools, see [Verify a bot is genuine](/guides/ai-traffic/identify-bots/#verify-a-bot-is-genuine).
</Aside>

## Track AI referral traffic in web analytics

Referral traffic is the human side of AI traffic: visitors who click through from an AI answer. Capture it in your web analytics tool by segmenting on the referrer.

<Steps>

1. In your analytics tool, create a segment or filter for traffic whose source or referrer hostname matches an AI product, such as `chatgpt.com`, `perplexity.ai`, `gemini.google.com`, or `copilot.microsoft.com`.

2. Compare landing pages, engagement, and conversions for this segment against your other channels to judge whether AI citations send qualified readers.

3. Review the segment over time. The set of AI products that drive referrals changes, so revisit the hostname list each quarter.

</Steps>

<Aside type="note">
Some AI clients strip or omit the referrer, so referral analytics undercounts true AI-sourced visits. Treat it as a directional signal, and combine it with server-log crawler data for the full picture.
</Aside>

## A note on privacy and abuse

- **Do not log personal data you do not need.** Access logs can contain IP addresses and other identifiers. Follow your retention and privacy policies, and anonymize where your obligations require it.
- **Rate-limit abusive crawlers, not all bots.** If a verified bot fetches aggressively enough to affect performance, rate-limit or throttle it at the edge. Reserve blocking for bots you have confirmed are both unwanted and genuine.

## Verify your setup

You have a working monitor when you can answer, from your own data:

- Which AI bots reached your docs in the last week, and how many requests each made.
- Which pages those bots fetched most.
- How many human visitors arrived from AI products, and where they landed.

## Related

- [Understand AI traffic to your docs](/guides/ai-traffic/understanding/) — the categories behind the numbers.
- [Identify AI bots and crawlers](/guides/ai-traffic/identify-bots/) — the tokens and verification methods this guide relies on.
- [Add an llms.txt file to your docs](/guides/llms-txt/add-llms-txt/) — help AI systems use your content well once you can see them.