|
|
|
|
|
by cjf101
626 days ago
|
|
There was a bunch of reporting on how AI companies and researchers were using tools that ignored robots.txt. It's a "polite request" that these companies had a strong incentive to ignore, so they did. That incentive is still there, so it is likely that some of them will continue to do so. |
|
If we're thinking of the same reporting, it was based on a claim by TollBit (a content licensing startup) which was in turn based the fact that "Perplexity had a feature where a user could prompt a specific URL within the answer engine to summarize it". Actions performed by tools acting as a user agent (like archive.today, or webpage-to-PDF site, or a translation site) aren't crawlers and aren't what robots.txt is designed for, but either way the feature is disabled now.
[0]: https://commoncrawl.org/faq
[1]: https://platform.openai.com/docs/bots
[2]: https://support.anthropic.com/en/articles/8896518-does-anthr...
[3]: https://blog.google/technology/ai/an-update-on-web-publisher...