|
|
|
|
|
by pama
388 days ago
|
|
I have nothing against random anecdotes per se, but a lot of academic code does not correctly optimize computation on GPU hardware. If you can estimate by pen and paper how many FLOPs/s your code was using because of the main operations it had to do and how that number compared to the theoretical limit of bfloat16 performance on the NVIDIA GPU (about 2.6 * 10^15 for the 8 A100 IIRC) then you can see a bit better how close your code was to optimality. I have seen low effort performance scaling reach less than 1% of these theoretical numbers and people were super happy because it was sufficiently fast anyways (which is fine) and it showed the GPU utilized all the time (but with only 1% of these possible ALU doing anything useful at all times). |
|
They'd probably have to spend $5k-20k on a multicore or NUMA-style box to get huge gains on multithreaded code. They also loose the cool factor of saying they're using a RTX. Maybe grant money if it's tied to GPU use. Between the three, it might make sense, even financial sense, to get a sub-$2000 GPU to accelerate academic code that barely uses the GPU.
I'm just brainstorming here, though.