Hacker News new | ask | show | jobs
by PaulHoule 2976 days ago
I know an engineer who prototypes GPU-like systems with FPGA and he has told me to be skeptical about performance miracles.

No matter how fast a system is on the inside you have to get data in and out of it -- at the very least to memory. SRAM takes too much area and there is a limit DRAM bandwidth despite technologies such as eDRAM and HBM. Some tasks are compute intensive, but for general tasks, a processor that is 100x faster would need 100x faster memory to really be 100x faster.

Thus advances in real-life performance are likely to be more like a factor of 2.

For training I never pay full price in the AWS cloud, rather I run interruptable instances and pay a fraction of the list price. People I know who train in the Google cloud seem to get interrupted all the time even though they are paying full price.

Inference is another story. Once you have the trained model, you will usually need to run inference many many more times than you run training and this gets more so the bigger scale you are running at. That hits your unit costs and it is where you need to pinch every penny.