Hacker News new | ask | show | jobs
by Const-me 1484 days ago
I guess I was lucky with the CAM/CAE software I’m working on. We don’t have too many GB of data, the stuff fits in VRAM of inexpensive consumer cards.

One typical problem is multiplying dense vector by a sparse matrix. Unlike multiplication of two dense matrices, I don’t think it’s possible to decompose into manageable pieces which would fit into caches to saturate the FP64 math of the CPU cores.

We have tested our software on nVidia Teslas in a cloud (the expensive ones with many theoretical TFlops of FP64 compute), the performance wasn’t too impressive.