Hacker News new | ask | show | jobs
by martin_a 1173 days ago
By indexing and training on everything it can find in the internet?!

To explain this further: OpenAI et al. (as commercial products) are being trained on content that is published under licenses that allow non-commercial use only. Do those systems respect these licenses? It doesn't look like that. "AI companies" need to stick to laws but as nobody is able to look inside their blackboxes, we can't make sure they follow the law. That's where legislation like this comes from.

2 comments

> By indexing and training on everything it can find in the <PUBLIC> internet?!

and that's bad because?

I would see the point if they were training on my private data I entrusted to somebody and they illegally obtained it without my permission. Are they doing that?

See my edit: They will ignore licensing information and train on data, possible privacy related information too, without any respect.

See this: https://news.ycombinator.com/item?id=32573523

What kind of "privacy related information"? This is data on the open internet!
They don't copy and reproduce the data. They change it sufficiently for the licence to have any say. Fair use it's called.
Fair Use is a US-specific notion and doesn’t exist in that form in most other countries.