|
|
|
|
|
by LinuxBender
1007 days ago
|
|
I suspect not many website operators/developers are aware this exists. Usage of robots.txt is unenforceable and would only show intent to OpenAI. This would not be useful for other LLM's as Google, Bing and other search engines already have decades of ingested data to feed their LLM's. In my poor armchair quarterback opinion if people wish for something to not be crawled then they must make a best effort to ensure only humans are accessing it with strong authentication, legal agreements, best-effort bot detection and also have binding legal contracts that implement punitive actions for doing something with data it was not approved for and then actually follow through with legal action for breach of contract. |
|
Companies like OpenAI also have to do a lot of things to ensure the compliance to the regulation.