|
|
|
|
|
by minthd
3939 days ago
|
|
The bottleneck actually is arithmetic. "GPUs have much higher ALU throughput since the GPU chip area is almost entirely ALU" http://devblogs.nvidia.com/parallelforall/bidmach-machine-le... Also on the horizon there is 3d chip manufacturing technology(3d-monolithic) ,with extremely large bandwidth between the two different layers of the chip,possibly being gpu + dram. |
|
The energy cost of transferring a single data word to a distance of 5mm on-chip is higher than the cost of a single FLOP (20 pico-Joules/bit). 5mm =~ the distance to L2 cache or another CPU core. The cost of transferring data off-chip (3D chip and/or RAM) is orders-of-magnitude higher, see graph.
[0] http://iwcse.phys.ntu.edu.tw/plenary/HorstSimon_IWCSE2013.pd...