|
|
|
|
|
by filterfiber
927 days ago
|
|
> Hardware: StableLM Zephyr 3B was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes. I might be missing it but do they say the number of training tokens that was used to train this? This would help with efforts like TinyLlama in trying to figure out how well the scaling works with training tokens vs parameter size and challenging the chinchilla model. |
|
https://stability.wandb.io/stability-llm/stable-lm/reports/S...