Hacker News new | ask | show | jobs
by speedgoose 984 days ago
I would have wikipedia and a dump of some of the most important research papers (from sci-hub?).

If size isn't a limit, a copy of the latest common crawl dataset.

If size is really restricted and it's only one file, then I would seriously consider LLaMa2 70B.

It hallucinates, but in terms of knowledge in about 100GB I don't think you can find anything better.

1 comments

My literal first thought was that at the very least, someone had already collated SciHub. I would consider that maybe the most essential piece of the database.

Your LLaMa2 suggestion was very thought provoking and meritorious, there might be some path forward with something like that, even if for some neutral knowledge steward AI to be the Interface of the Database.

AIDBI?