|
|
|
|
|
by raxxorraxor
839 days ago
|
|
I guess if you selectively allow crawlers that promise to not use the data in such a way, robots.txt is still the way to go. Otherwise you need to selectively allow certain bots. However, as well as with web crawlers, respecting a robots.txt is optional. Insidious with AI-models is that it is difficult or practicably impossible to prove that it trained on your data. Difficult to establish a standard like robots.txt. There also was .well-known/security.txt that Google proposed. Some sites serve it, but it hasn't really become a standard. |
|