Hacker News new | ask | show | jobs
by vishal0123 1210 days ago
LLAMA made tradeoff for reducing parameter budget instead of training computation budget. This is better for inference computation budget.

Optimal number of tokens for 7B parameters is around 140B tokens[0], and meta trained it for trillion tokens.

[0]: https://arxiv.org/pdf/2203.15556.pdf