|
|
|
|
|
by daxfohl
1840 days ago
|
|
GPU L0 cache latency IIUC is ~20x higher than CPU. In fact in this case I think GPU would have to use L2 cache since the data is shared across so many cores, so now you're talking ~50x. So even if you get full parallelism of cell computation you can plug in the numbers and find it would be far slower than FPGA (but still faster than CPU). I'm not an expert though. Maybe GPUs have some way of mitigating the high cache latencies. |
|