Hacker News new | ask | show | jobs
by sunpazed 906 days ago
> While LLM projects typically require an exorbitant amount of resources, it is important to remind ourselves that research does not need to assemble full-fledged massively expensive systems in order to have impact.

Check out TinyLlama; https://github.com/jzhang38/TinyLlama

Four research students from the Singapore University of Technology and Design are pretraining a 1.1B Llama model on 3 trillion tokens using a handful of A100's.

They're also providing the source code, training data, and fine-tuned checkpoints for anyone to run.

1 comments

Even if they ran it without facing any issues and 0 testing, it would have taken 35k A100 hours or $70k-100k. It is not cheap to do it.
I’d agree — but would argue affordable for a sponsored dissertation program with 3 research students and an associate professor. They’re actually still training it!
For one run, yes. But if they are testing new architecture or something like that, they need at least dozens of them. If they are not testing new architecture, finetuning is almost always the way to go.