Hacker News new | ask | show | jobs
by acuozzo 127 days ago
> an LLM with such little data

There is a mountain of data pre-1905. Certainly enough to train a decent 30B parameter model.

Now, digitizing & OCRing all of that data... THAT is a challenge.