| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lemonademan 21 hours ago

To answer the first question, yes! Web operators and security professionals actively track and categorize AI agents separately from traditional search crawlers because they serve fundamentally different purposes and impact site resources in distinct ways.

I built a database website a few months ago and submitted it to Google, Bing, and Yandex. 2 months later, according to my Cloudflare dashboard, I have 1.5 million unique visitors monthly. I found that human visitors only accounted for about 10% of the total, followed by search engine crawl bots and then AI crawl bots. I also discovered that AI bots (like GPTBot, ClaudeBot, or PerplexityBot) scraped a lot rapidly without adhering to traditional crawl limits or deeply checking robots.txt files, resulting in high server loads.

That should also answer the second question, which is that AI retrieval systems index semantic structure faster than they index page content. You have to understand that AI doesn't just index your website like regular crawl bots, which index mostly your content, schema, and so on. AI bots go deeper by trying to understand your website structure, as this will also help in training other AI models.