|
|
|
|
|
by ssharp
1225 days ago
|
|
The training model data sets have inconsistent respect for robots.txt. Also, I believe most of these models are not continuously crawling websites to update their data like a search engine does. That means if you're crawled once, you may not be crawled again and you'll still be in the datasets. I'd also argue that Google directing traffic to your website is a good alignment of incentives. ChatGPT spitting out answers derived from your work with nothing given back to you in return is not. |
|