|
|
|
|
|
by KeplerBoy
959 days ago
|
|
Just for clarification: The 1080 has 20 SMs with 128 FPUs each. Each FPU can perform 2 FLOPs per cycle (fused multiply adds). Combined with the frequency of 1607 MHz we land on the advertised 8.2 TFlop/s. The fact that each SM can support 1024 threads (that's the maximum blocksize of CUDA on that card) doesn't do much for the theoretical flops. Only a fraction of those threads can be active at a time. The others are idling or waiting on their memory requests. This hides a lot of the i/o latency. |
|
It's still somewhat interesting because threads are a low-level programming primitive. If you can come up with work for 40k simultaneous threads, you can use the GPU effectively. For some tasks this parallelization is obvious (a HD video frame has 2 million pixels and shading them independently is trivial), and of course often it's anything but.