|
|
|
|
|
by jwan584
996 days ago
|
|
A helpful paper with the full recipe Cerebras uses to train LLMs and their process including:
- Extensively deduplicated dataset (SlimPajama)
- Hyperparameter search using muP
- Variable sequence length training + ALiBi
- Aggressive LR decay |
|