Hacker News new | ask | show | jobs
by HarHarVeryFunny 664 days ago
You're not really asking for an open source model though, you're asking for open source training data set(s), which isn't something that Meta can give you. There are open source web scrapes such as The Pile, but much of the more specialized data needs to be licensed.
1 comments

I'm asking for an "Open Source AI" and Meta and everyone supporting them is convinced its impossible in our lifetimes :( We are living in the Dark Ages where Information = $$$. I pray to AI we one day grow out of this pointless destructive economic spiral towards the heat death of the Earth and collect and share open knowledge across all human cultures and history.
Well, as long as by "AI" you are referring to pre-trained transformers, then what you are effectively asking for is the data used to pre-train them.

OTOH why you want the data is not clear. You don't need it to run Meta'a models for free, or to fine-tune them for your own needs. The only thing the data would allow you is to pre-train from scratch, in other words to obtain the exact same set of weights that Meta is giving you for free.