Hacker News new | ask | show | jobs
by robhoeijmakers 53 days ago
It is good to make a proper distinction, in the ChatGPT context, between crawlers and agents. The crawlers go for the content to build a new model, the agents serve content to users. The last one can be very useful.
1 comments

They use different user-agent strings. The crawlers obfuscate themselves and use residential proxies. The agents call themselves ChatGPT-User. Of course Cloudflare wants OpenAI to pay them for not blocking ChatGPT-User by default.
It's true, crawlers used for AI training don't say they are crawlers at all.