| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vishal0123 1210 days ago

LLAMA made tradeoff for reducing parameter budget instead of training computation budget. This is better for inference computation budget.

Optimal number of tokens for 7B parameters is around 140B tokens[0], and meta trained it for trillion tokens.