Hacker News new | ask | show | jobs
by ml_hardware 1853 days ago
I think your math is backwards. The training workload W is the same, and the time to complete it is:

W / (256 * speed_v4) = 1.82

W / (4096 * speed_v3) = 0.39

speed_v4 / speed_v3 = (4096 * 0.39) / (256 * 1.82) = 3.43

Note that this assumes that training speed is perfectly linear with the # of accelerators which is not true as you get to very large #s (like 4096!). So the true number should be smaller than the 3.43x above, and the reported 2.7x makes sense.