Hacker News new | ask | show | jobs
by 0x07c0 3724 days ago
The Tesla K40 has peak double performance of ~1.4 TFLOPS. It has 64 DP cores, the warp scheduler can schedule four warps per smx per cycle. It can therefore have two warps executing double instructions at the same time. But the number is not very interesting, the memory bandwidth on the other hand is, a GK110 has 288GB/s, take you code, get it's arithmetic intensity and you have a upper bound for your performance, assuming you are memory bound of course.

https://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK11...

https://www.nvidia.com/content/tesla/pdf/nvidia-tesla-k40-20...