| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wongarsu 1855 days ago
	Generally a ML model transforms the copyrighted material to the point where it isn't recognizable, so it should be treated as its own unrelated work that isn't infringing or derivative. But then you have e.g. GPT that is reproducing some (largeish) parts of the training set word-for-word, which might be infringing. Also I don't think there have been any major court cases about this, so there's no clear precedent in either direction.

3 comments

visarga 1855 days ago

> But then you have e.g. GPT that is reproducing some (largeish) parts of the training set word-for-word, which might be infringing.

Easy fix - keep a bloom filter of hashed ngrams ensuring you don't repeat more than N words from the training set.

link

pabs3 1855 days ago

There are some that say that the Google Books court case is precedent for ML model stuff, if you search back through my comment history you will find links.

link

sodality2 1855 days ago

Thanks!

link