AI Crawlers vs Search Crawlers: the Complete 2026 Bot List

Two fleets, three jobs

All the bots that check out your website fall into one of two fleets. There are the traditional search crawlers, Googlebot and Bingbot, where web pages get indexed so that a search engine can rank the web. And there are the AI crawlers, which fetch pages for large language models. Of that second group, though, there are actually three different jobs going on. Some are crawling so as to train or update a model, others are making a retrieval index for the assistant to answer out of, and yet others are fetching a page right now, as they were asked to by a user, if it were something they were reading specifically. The difference matters because you can directly relate it to what you actually might want. Being indexed to be quoted is about visibility, and being used as training data is a licensing call.

OpenAI: GPTBot, OAI-SearchBot, ChatGPT-User

OpenAI has three crawlers you should know about. GPTBot is the catch-all gathering content for training and, increasingly, retrieval. OAI-SearchBot makes the index underpinning ChatGPT Search, and is the one most linked to your availability as a source in ChatGPT's answers. ChatGPT-User is different, in that it triggers when someone asks ChatGPT to open or browse a certain URL, fetching a page on demand rather than crawling broadly. If you're wanting to show up in ChatGPT, the two crawlers you'd want enabled are OAI-SearchBot and GPTBot.

Anthropic: ClaudeBot and Claude-User

ClaudeBot is Anthropic's general web crawler for gathering content for Claude, while Claude-User is the counterpart for live browsing to fetch a page when someone using Claude wants to look something up or read a link. As with OpenAI, it's the combination of the background crawler with the on-demand fetcher, and blocking either means you don't show up in that corresponding behaviour for Claude.

Google: Googlebot and Google-Extended

Google is the one that tends to confuse people. Googlebot crawls for Google Search and fuels AI Overviews; blocking it knocks you out of Google entirely, which is almost never anyone's intent. Google-Extended is the different thing that controls whether or not your content is used for Gemini grounding and training. And the cool thing about that is that if you keep Googlebot fully open for normal search and AI Overviews, you can then still separately control whether you want to participate in Gemini's training by opting in or out through Google-Extended. They're two separate switches, and confusing the two is a pretty easy trap to fall into.

Perplexity, Microsoft and Apple

Perplexity runs a crawler called PerplexityBot to make their answer index and Perplexity-User for on-demand fetches triggered by a user. Microsoft has Bingbot, for Bing search and Copilot, where Bingbot does both search and acts as an AI assistant so it straddles the two fleets. Apple has Applebot, which they expose Applebot-Extended to control whether you train their models, a parallel of Google's where one is a search crawler and the other is a separately controllable opt-out from AI training.

The dataset and training crawlers

Finally, there are crawlers that do primarily train data, not your visibility to be sent to. CCBot, for instance, is for the Common Crawl, an open dataset that has been used as a training corpus for a huge number of large language models. And despite the singular name, it is still extremely important, for the simple fact of who uses Common Crawl's data. There's also Bytespider, the crawler for ByteDance, Meta-ExternalAgent for the AI from Meta, and Amazonbot for Amazon and Alexa, where blocking them isn't knocking you out of any answer engine like you do blocking of GPTBot or PerplexityBot; it's just about whether or not you're participating in their training set.

What to do with this

The takeaway of all this is that "should I block AI crawlers" is the wrong question, because the fleet is by no means monolithic. It is better to decide which of these you want and why, whether you're keeping the answer engine crawlers enabled for AI visibility is a different question from a training-only crawler and the optional licensing decision there. And each of the bots in the above list can be allowed or denied by user-agent, in accordance with robots.txt, so once you know who is who the control is all yours. If you want to see what your website currently looks like across them all, try the AI readiness checker, or use our guide to blocking AI crawlers to turn your decision into a robots.txt rule, or the guiding overview of generative engine optimization to get a grip on the high-level picture.

Frequently asked questions

What is the difference between an AI crawler and a search crawler?

A search crawler like Googlebot or Bingbot indexes the web for a search engine. An AI crawler fetches pages for a large language model — either to train it, to build a retrieval index it answers from, or to read a page live when a user asks. They are separate fleets with separate user-agents, which is why you can allow one and block another.

Which AI crawlers should I allow?

If you want visibility in AI answers, allow the answer-engine crawlers: Googlebot, GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot and Google-Extended. Whether you allow the pure training and dataset crawlers, such as CCBot or Bytespider, is more of a content-licensing decision than a visibility one.

Do these crawlers respect robots.txt?

The major, named crawlers publicly state that they honour robots.txt user-agent rules. Less reputable scrapers may ignore it, but for the well-known bots listed here, a disallow line is respected.

How do I check which ones my site allows?

Run the AI readiness checker. It reads your robots.txt and reports, bot by bot, which of the major AI and search crawlers are currently allowed or blocked on your site.