Hacker News new | ask | show | jobs
by karmasimida 1263 days ago
There are some issue with this.

SEO is basically tricking machine learning models to rank you higher. And if someone posted spam, this clause shall and will not prevent it from being labelled as spam in service provider's training model.

I can see a case for generative model to be prohibited specifically, but commercial software would avoid adapting those data in their code in the first regardless, as it is contaminative.

In the end, I think the bigger question is still, what would forbidding machine learning models to train on your data is for? Why is it bad? There is already enough data out there that is free to be trained on, so if this is going to prevent those models from continuing existing, I doubt it will work.