Hacker News new | ask | show | jobs
by breve 1 hour ago
Copyright issues don't seem to be addressed by any large language model provider.

If an LLM is trained on GPL code then that code has become an intrinsic part of the model (because if it hasn't then what was the value of training on it). So shouldn't that model now also be licensed GPL?

And how do I know the LLM output is not reproducing substantial chunks of GPL'd code, making my code GPL?

3 comments

Or alternatively. LLM is not human. Non human generated content has no copy right protection. Meaning all generative model output is automatically public domain.
Github copilot has filters for enterprise that remove the GPL code before it gets returned. At least that’s how my company has been covering itself.
Maybe this, but multiply by N licenses. Any given output may have ideas from all of them.

Law is probably going to take a while to catch up here.