Hacker News new | ask | show | jobs
by api 1792 days ago
What about non-traditional-FOSS licenses? There is a lot of source-available not-OSI-compliant licensed software on GitHub like MongoDB, CockroachDB, etc., and that's clearly proprietary. If this thing is trained on that and generates what amount to snippets of that code then it's clearly violating those licenses.

Then there's private repositories. If they included those in the training data set that's even more actionable.

Personally I think this is software piracy at an absolutely unprecedented scale. Machine learning is just information transfer from the training data into weights in a model, a close relative of lossy data compression. Microsoft is now reselling all its GitHub users' code for profit.

1 comments

Private repositories weren't included in the training data per-github, only public repos.

This really doesn't give me much comfort though. Making a repo public doesn't imply anything, it could be "All rights reserved".