Hacker News new | ask | show | jobs
by alkonaut 1132 days ago
That someone reads my code I expect. That someone reads my code and uses it to train a machine they money off I didn't expect, but I also can't say I object.

However, that part of the argument feels like the less interesting CoPilot legal argument. The interesting one is: what's the license for use of the code it spits out? Any time CoPilot spits out a nontrivial piece of code that a) exists verbatim on Github and b) is nontrivial enough to be copyrightable, then what happens? Just because it was chewed through the machine doesn't magically wipe the original GPL/MIT/BSD license it had on GitHub. CoPilot doesn't represent a "clean room".

Large companies tend to be extremely skittish about devs using IP they don't have rights to. I lived under a rule of "No open source licensed thing , at all, anywhere" for years in the early 2000. Later, the rules are relaxed and obviously everyone uses MIT/BSD type stuff in commercial products these days, but management is still nervous about things like Stackoverflow answer code being copied verbatim (Still verboten). So how can - if I understand things correctly - CoPilot be allowed or encouraged at such places now? Wouldn't exactly the same worry about nontrivial StackOverflow snippets apply to CoPilot produced code?

1 comments

And if indeed it's treated as clean room, does open source need to just pack it in? Are all of our licenses rendered unenforceable?
It feels like there is zero chance it could be used as some sort of blanket copyright cleaner. If it is then I'll make my own "model" (Ok a 2 line python script) that produces royalty free bestseller novels if you just prompt it with the title (Its training is extremely simple it just responds with the content of the book file with the same filename!). The fact that in a LLM we don't quite understand the black box, and the novels are chopped into tokens doesn't mean that IF they are stringed back together they are still the same paragraph.