Hacker News new | ask | show | jobs
by mysterydip 1811 days ago
Makes me wonder what would happen if a similar thing was done with books. If I train an AI on all the texts of Tom Clancy, or Stephen King, or every Star Wars novel, and the books it generates every so often produce paragraphs verbatim from one of those sources, would copyright owners be up in arms? What would the distinction be between the code case and the text case?
2 comments

I am not a lawyer. I do photography and have a more than passing interest in copyright as it applies to the photographs I take and the material I photograph.

Copyright on art gets more interesting / fuzzier. The key part is substantial similarity - https://en.wikipedia.org/wiki/Substantial_similarity and https://www.photoattorney.com/copyright-infringement-for-sub...

Rather than text, my AI copyright hypothetical... consider a model created based on sunset photographs. You take a regular photograph, pass it through the model, and it transforms it into a sunset. The model was trained on copyrighted works but the model is considered fair use.

Now, I go and take a photograph from some location during the day and then pass it through the transformer and get a sunset. Yea me! Unbeknownst to me, that location is a favorite location for photographers and there were sunsets from that location used in the training data. My photograph, transformed to look like a sunset is now similar to one of them in the training data.

Is my transformed photograph a derivative work of the one in the training data to which it bears similarity to? How would a judge feel about it? How does the photographer who's photograph was used in the training data feel?

What would be interesting in that case would be how the transformed image would look if photos from that location were removed from the training set. That would help reveal whether it was just copying what it had seen or it actually remembered what sunsets looked like and transformed the image using its memory of sunsets in general.
This will surely happen within the next few years; but if the "new work" contains a full paragraph from an existing novel the copyright hammer would come down hard.

Maybe it needs to be paired with another network / hunk of code that checks for verbatim copying?