| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sunpazed 953 days ago

> While LLM projects typically require an exorbitant amount of resources, it is important to remind ourselves that research does not need to assemble full-fledged massively expensive systems in order to have impact.

Check out TinyLlama; https://github.com/jzhang38/TinyLlama

Four research students from the Singapore University of Technology and Design are pretraining a 1.1B Llama model on 3 trillion tokens using a handful of A100's.

They're also providing the source code, training data, and fine-tuned checkpoints for anyone to run.

1 comments

YetAnotherNick 952 days ago

Even if they ran it without facing any issues and 0 testing, it would have taken 35k A100 hours or $70k-100k. It is not cheap to do it.

link

sunpazed 952 days ago

I’d agree — but would argue affordable for a sponsored dissertation program with 3 research students and an associate professor. They’re actually still training it!

link

YetAnotherNick 952 days ago

For one run, yes. But if they are testing new architecture or something like that, they need at least dozens of them. If they are not testing new architecture, finetuning is almost always the way to go.

link