Hacker News new | ask | show | jobs
by sillysaurusx 1005 days ago
I don’t think we can. People are furious at anyone who even touches their data. This fury by the people will transfer from OpenAI towards the smaller players.

Result: open source AI dies. There’s no way to get any data, and what’s available outside of copyright isn’t enough. Not to compete with ChatGPT.

Sure, there will always be cool models. But nothing like what we were hoping for. LLaMA is being sued right now precisely because it used copyrighted books. Open source entities don’t have the legal resources to defend themselves from these threats.

3 comments

Piracy doesn't respect copyright. Why not steal all the great books and works ever made just to feed this model. Like a protocol for free training data that you could torrent around add contribute to the hive. Not well thought out but that's the idea.
On the contrary, it is possible to train efficient models with purely synthetic data, see Phi-1 and Phi-1.5 from Microsoft.
What if there was nobody to sue? Open source could be structured that way, like MAME.