If it’s on the open internet then why should they have to do that? How is openai training on articles fundamentally different from the wayback machine storing them? They’re just getting stored in a different form.
That's like saying that taking a sample from a song and putting it in a remix is just a different way of storing the original song. If I cut the original song up just right and put enough samples across different songs...
The issue here really lies in "yeah so how actually DOES this make a difference" legally speaking. It just seems unfair that I'm not allowed to copypaste a text verbatim or upload a movie to youtube that I don't own the copyright for, yet OpenAI can happily commercialize on content that has been sorted and rated for quality by someone else.
The question is "are they allowed to do this under the umbrella of current copyright legislation?" and the answer has far reaching implications.
Copywrite law doesn't ban people from reading the source altogether.
That is what is being proposed, ban AI from being allowed to 'read' the content.
The real argument is how does a human brain aggregate knowledge and then profit from it, and is it really that different from an AI model aggregating knowledge.
They both read in data, perform calculations on the data, and spit out something.