Hacker News new | ask | show | jobs
by ssharp 1225 days ago
The training model data sets have inconsistent respect for robots.txt. Also, I believe most of these models are not continuously crawling websites to update their data like a search engine does. That means if you're crawled once, you may not be crawled again and you'll still be in the datasets.

I'd also argue that Google directing traffic to your website is a good alignment of incentives. ChatGPT spitting out answers derived from your work with nothing given back to you in return is not.

1 comments

I bet that fully half the time, I read the google answer, click on nothing and go on my way.
That's still better than 0%