Hacker News new | ask | show | jobs
by red_admiral 640 days ago
The website owner chooses. They can say "nope" in robots.txt. Not everyone respects this, but Google does. Google can choose not to show that site as a result, if they want to.

This adds a third option besides yes and no, which is "here's my price". Also, because cloudflare is involved, bots that just ignore a "nope" might find their lives a bit harder.

1 comments

Robots.txt is for crawlers. It's explicitly not meant to say one-off requests from user agents can't access the site, because that would break the open web.
Yep, there's really two parts to this.

* Some company's crawler they're planning to use for AI training data.

* User agents that make web requests on behalf of a person.

Blocking the second one because the user's preferred browser is ChatGPT isn't really in keeping with the hacker spirit. The client shouldn't matter, I would hope that the web is made to be consumed by more than just Chrome.