Hacker News new | ask | show | jobs
by kamkazemoose 1781 days ago
> This is exactly what I was thinking about. If Copilot is fair use, it means that all proprietary source code, as long as they're publicly available to read, will be free to use as training materials for a hypothetical free and open source machine learning project, which I think would be a good thing. An example is a proprietary program released under a restrictive "source available" license, you can read it but not reuse it under any circumstances (and I believe these projects are already included in Copilot's training data). This is why I said fair use can be a good thing and a ruling to reduce the scope of fair use can potentially be used by proprietary software vendors against the FOSS community.

FWIW this seems to be the current interpretation of copyright laws when it comes to machine learning, at least in the US. The only questions I've really seen about the legality of Copilot is about it reproducing code and whether that reproduction is fair use or not. But few are arguing that training the model itself on any available source is violating fair use.

1 comments

> FWIW this seems to be the current interpretation of copyright laws when it comes to machine learning, at least in the US.

I think this is a sensible take. An AI should be able to learn to program from any source code it can see, just like a human.

> But few are arguing that training the model itself on any available source is violating fair use.

People argue this all the time on HN.

But these same people seem to believe it is just pasting bits of code it has seen before together, so I suspect they don't have the technical or legal understanding to comment sensibly.