Hacker News new | ask | show | jobs
by zaptheimpaler 1804 days ago
Microsoft just stole all the code on github to do this. Regardless of what the minutiae of the law say, no one really expected their work to be used this way. Open source code powers a huge chunk of the industry while capturing little value for the maintainers already. Github even explicitly supports a standard format for declaring the license of a repo, which was cleverly ignored.

Here is the relevant section from Githubs privacy policy [1]

> 6. Contributions Under Repository License

> Whenever you add Content to a repository containing notice of a license, you license that Content under the same terms, and you agree that you have the right to license that Content under those terms. If you have a separate agreement to license that Content under different terms, such as a contributor license agreement, that agreement will supersede.

From GPLv2, "When distributing derived works, the source code of the work must be made available under the same license."

------

This is not about technology, it is a legal endrun around using open source code without open sourcing derived work. It is using AI as a form of "license laundering".

"OpenAI" is not open at all. Truly open AI means the code, the data and the model are all open. OpenAI sold the source to GPT-3 to Microsoft, received $1 billion from them in 2019 and does not make most of their work available except behind a highly exclusive, paid API - https://beta.openai.com/pricing/. Its a joke to call that "open". I urge you to read up on OpenAI and look at what the have actually done.

Their plan in the future is to sell access to Copilot, directly monetizing work they stole from others for free:

> According to GitHub, “If the technical preview is successful, our plan is to build a commercial version of GitHub Copilot in the future.”

I've deleted all my code from github and hope others do the same. Maybe if some bigger profile project starts doing this, we can start to organize around opposing Pilot and OpenAI.

Others have also pointed out similar concerns - see https://news.ycombinator.com/item?id=27687450 for example.

[1] https://docs.github.com/en/github/site-policy/github-terms-o...

[2] https://beta.openai.com/pricing/

1 comments

It’s a shame that copilot would not be possible without all the zillions of hours of work that went into writing that code, while the authors of that training data get zero compensation for their contribution to copilot (and zero ability to opt out).
I'm guessing that since there are hundreds of millions of repositories the typical marginal value of someone's contributions would optimistically be on the order of a few dollars. But since the consensus on HN is that they spend very little time actually coding and there is no use-case for copilot, perhaps it worth a lot less.
If I stole just $0.50 from every american, the typical marginal value of their contribution is tiny, but I still stole nearly $200M. Maybe none of those people will raise much of a stink because it's just $0.50, but it's just as bad.

Practically, it's bad in that I never got the chance dictate how they use my code. My GPL code has very little marginal value to my users, but I got to dictate that their work that uses it is also GPL (or they can pay me for a different license). I want that choice when it comes to my work being used as ML training data.

I think it will be great if they can create some mechanism to compensate people for their data, I just suspect many people conflate the value of their data as training data and say how much they might charge a client to write some similar code.