Hacker News new | ask | show | jobs
by anthonix1 736 days ago
Final loss from that fineweb-10B run (since then I'm up to ~100k toks/sec/GPU):

step 18865/18865 | train loss 3.280550 | norm 0.4362 | lr 0.00e+00 | 1669.06 ms | 55.4% A100 fp16 MFU | 314058 tok/s Writing state to log124M/state_00018865_00003.bin val loss 3.296179

You can buy these GPUs on Amazon for under $1k. I heard the MI300X may be available in Azure now or at least very soon.