Hacker News new | ask | show | jobs
by dartos 593 days ago
Should we webmasters just start blocking user agents wholesale?

I mean except known good actors.

I guess known actors would need a verifiable signature

4 comments

Not viable. They are going to use user agents that look like those coming from completely normal human users.

"Verifiable signature"? That's a dangerous road to go down, and Google actually wanted to do it (Web Integrity API). Nobody supported them and they backed out.

Search engine crawlers do have verifiable signatures, if a client claims to be Googlebot or Bingbot you don't have to take their word for it.

https://developers.google.com/search/docs/crawling-indexing/...

https://www.bing.com/webmasters/help/how-to-verify-bingbot-3...

But the converse is not true? There is no guarantee the crawler is not amassing data for model training, or that a crawler (AI or otherwise) does not disguise itself as a normal user?
Yeah, but traffic appearing to come from normal users can be throttled and/or CAPTCHA'ed while still allowing Google and Bing to crawl to their hearts content so your SEO isn't affected.
I would think rate-limiting would be good. Crawlers are not patient enough to operate at the speed of a real human user.
Greedy crawlers will use fake user-agent strings.