|
|
|
|
|
by anthonix1
736 days ago
|
|
Final loss from that fineweb-10B run (since then I'm up to ~100k toks/sec/GPU): step 18865/18865 | train loss 3.280550 | norm 0.4362 | lr 0.00e+00 | 1669.06 ms | 55.4% A100 fp16 MFU | 314058 tok/s
Writing state to log124M/state_00018865_00003.bin
val loss 3.296179 You can buy these GPUs on Amazon for under $1k. I heard the MI300X may be available in Azure now or at least very soon. |
|