Hacker News new | ask | show | jobs
by olalonde 427 days ago
Where are you seeing that? I just read the definition and it doesn't seem to a allow closed components:

> An Open Source AI is an AI system made available under terms and in a way that grant the freedoms to:

> Use the system for any purpose and without having to ask for permission.

> Study how the system works and inspect its components.

> Modify the system for any purpose, including to change its output.

> Share the system for others to use with or without modifications, for any purpose.

https://opensource.org/ai/open-source-ai-definition

1 comments

They allow a major component of the model, the data, to be withheld.
Not only withheld, but also completely proprietary, not modifiable nor redistributable.
Nobody owns their data. They just scrape the internet, or pirate massive troves of books. Just forcing companies to get a license to all the data they use, let alone an open license, would be a massive impediment to the development of open models.
It is definitely doable to get openly licensed data, you just have to do it via voluntary participation of crowdsourced data acquisition programs. For example the RNNoise model was retrained from such crowdsourced data.
IBM did it with their Granite models.
The data used for training Granite doesn't sound like it would be under FOSS licenses.

https://en.wikipedia.org/wiki/IBM_Granite