|
|
|
|
|
by vishal0123
1210 days ago
|
|
LLAMA made tradeoff for reducing parameter budget instead of training computation budget. This is better for inference computation budget. Optimal number of tokens for 7B parameters is around 140B tokens[0], and meta trained it for trillion tokens. [0]: https://arxiv.org/pdf/2203.15556.pdf |
|