Hacker News new | ask | show | jobs
by theamk 870 days ago
There is a nice essay from 2004 that answers that question, "What Color Are Your Bits" (https://ansuz.sooke.bc.ca/entry/23, discussion https://news.ycombinator.com/item?id=24917679)

It talks about copyright infringement in music, but it applies just as well to AI training, just substitute "scrambled file" with "model weights":

> The scrambled file still has the copyright Colour because it came from the copyrighted input file. It doesn't matter that it looks like, or maybe even is bit-for-bit identical with, some other file that you could get from a random number generator. It happens that you didn't get it from a random number generator. You got it from copyrighted material; it is copyrighted. The randomly-generated file, even if bit-for-bit identical, would have a different Colour. The Colour inherits through all scrambling and descrambling operations and you're distributing a copyrighted work,