Hacker News new | ask | show | jobs
by evilduck 770 days ago
I don't think that's what I was driving at. Monolith users in this scenario would be knowingly using copyrighted content with the clear intent to "de-copyright" it for distribution purposes by mixing it up into a new output via a reversible process. Which seems like it probably violates copyright because the intent is to distribute a copyrighted work even if the process makes programmatic detection difficult during distribution. This may operate within the wording of the law but it clearly is being done in bad faith to the spirit of the law (and this seems like standard file encryption of a copyrighted work where you are also publicly distributing the decryption key... and transmitting a copyrighted work over TLS today doesn't absolve anyone of liability). You seem to be suggesting this is what OpenAI has done via the transformer model training process - and acting in bad faith. Which is certainly possible but won't be proven unless their court case reveals it. I'm asking about the opposite: what if they acted in good faith?

What I'm getting at is that it's plausible that a LLM is trained purely on things that were available and licensed as Creative Commons but that the data within contains copyrighted content because someone who contributed to it lied about their ownership rights to provide that content under a Creative Commons license, i.e. StackOverflow user UnicornWitness24 is the perpetrator of the copyright violation by copying a NYT article into a reply to bypass a paywall for other users and has now poisoned a dataset. And I'm asking: What is the civil liability for copyright violations if the defendant was the one who was actually defrauded or deceived and was acting in good faith and within the bounds of the law at the time?