|
|
|
|
|
by Aurornis
386 days ago
|
|
Anonymous Internet comment section stories are confused and/or lie a lot, too. I’m not sure why you have so much faith in them. Also, this conspiracy requires coordination across two separate companies (GitHub for the repos and the LLM providers requesting private repos to integrate into training data). It would involve thousands or tens of thousands of engineers to execute. All of them would have to keep the conspiracy quiet. It would also permanently taint their frontier models, opening them up to millions of lawsuits (across all GitHub users) and making them untouchable in the future, guaranteeing their demise as soon a single person involved decided to leak the fact that it was happening. I know some people will never trust any corporation for anything and assume the worst, but this is the type of conspiracy that requires a lot of people from multiple companies to implement and keep quiet. It also has very low payoff for company-destroying levels of risk. So if you don’t trust any companies (or you make decisions based on vague HN anecdotes claiming conspiracy theories) then I guess the only acceptable provider is to self-host on your own hardware. |
|
The key question from the perspective of the company is not whether there will be lawsuits, but whether the company will get away with it. And so far, the answer seems to be: "yes".
The only exception that is likely is private repos owned by enterprise customer. It's unlikely that GitHub would train LLMs on that, as the customer might walk away if they found out. And Fortune 500 companies have way more legal resources to sue them than random internet activists. But if you are not a paying customer, well, the cliche is that you are the product.
[0]: https://cybernews.com/tech/meta-leeched-82-terabytes-of-pira... [1]: https://techcrunch.com/2024/12/11/it-sure-looks-like-openai-...