Hacker News new | ask | show | jobs
by davidw 124 days ago
The elephant in that room is that all these LLM's were trained on boatloads of open source software that they can remix enough to not violate any copyrights.

As an open source contributor, in some ways this makes me much more frustrated than someone making a closed source fork of a BSD licensed project.

1 comments

My take for a very long time has been that any model trained in violation of copyright should not itself be copyrightable. It should be public domain.

This would mean any model for which the trainer did not have permission to create a derivative work either implied by the work’s current license or obtained by them would have to release their model’s weights.

You could argue that it’s fair use, but a fair use quotation of a work does not become the property of the one quoting it. If I quote a line from a song or a novel I do not now own rights to that line. So there’s precedent for this.

Isn't all content generated by generative models already in public domain. Having something in the public domain doesn't force you to release it.
At the very least, that would be fair.

I feel like legal frameworks sometimes lose track of fairness