Hacker News new | ask | show | jobs
by anthonix1 745 days ago
So... successfully reproduced in ~8.75 hours, taking about 18 kWh / $2.70

The first run actually failed at step 3000 or so, and I realized I had a bug in my attention / matmul kernels, but after fixing that and restarting it worked great

[1] https://github.com/anthonix/llm.c

1 comments

What was the final loss? Is this hardware available for rent somewhere?
Final loss from that fineweb-10B run (since then I'm up to ~100k toks/sec/GPU):

step 18865/18865 | train loss 3.280550 | norm 0.4362 | lr 0.00e+00 | 1669.06 ms | 55.4% A100 fp16 MFU | 314058 tok/s Writing state to log124M/state_00018865_00003.bin val loss 3.296179

You can buy these GPUs on Amazon for under $1k. I heard the MI300X may be available in Azure now or at least very soon.