Hacker News new | ask | show | jobs
by adrian_b 1478 days ago
You are right that there are problems whose solving speed is limited by the memory bandwidth, and for such problems GPUs may be better than CPUs.

Nevertheless, many of the problems of this kind require more memory than the 8 GB or 16 GB that are available on cheap GPUs, so the CPUs remain better for those.

On the other hand, there are a lot of problems whose time-consuming part can be reduced to multiplications of dense matrices. During the solution of all such problems, the CPUs will reach a large fraction of their maximum computational speed, regardless whether the operands fit in the caches or not (when they do not fit, the operations can be decomposed into sub-operations on cache-sized blocks, and in such algorithms the cache lines are reused enough times so that the time used for transfers does not matter).

1 comments

I guess I was lucky with the CAM/CAE software I’m working on. We don’t have too many GB of data, the stuff fits in VRAM of inexpensive consumer cards.

One typical problem is multiplying dense vector by a sparse matrix. Unlike multiplication of two dense matrices, I don’t think it’s possible to decompose into manageable pieces which would fit into caches to saturate the FP64 math of the CPU cores.

We have tested our software on nVidia Teslas in a cloud (the expensive ones with many theoretical TFlops of FP64 compute), the performance wasn’t too impressive.