Hacker News new | ask | show | jobs
by rlu 3903 days ago
> The training converges after a few days of number crunching on a GTX980 GPU. Let’s take a look at the results.

Stupid question: why is the GPU important here? I would have thought this was more of a CPU task..??

(then again, as I typed this I remembered that bitcoin farming is supposed to be GPU intensive so I'm guessing the "why" for that is the same as this)

2 comments

A lot of this kind of work ends up being repetitive-- like multiplying two matrices together that have a few thousand entries each. These are the sorts of things that GPU's do very well with. GPU's have the ability to do such things on a massively parallel scale. GPU's also tend to have more memory bandwidth doing the kinds of things that a CPU would get bogged down on in the memory cache.
GPU's are really good at parallel tasks such as calculating the color of every pixel on the screen, or doing the same operation on a large dataset. According to Newegg, the GTX980 has 2048 CUDA cores (parallel processing cores) that run at ~1266 MHz as opposed to a nice CPU which might have 4 cores that run at 4 GHZ. In other words, if you want to manipulate a whole bunch of things in one way in parallel, you can program it to use the GPU effectively, if you want to manipulate one thing a whole bunch of ways in series, CPU is your best bet.

(note: this is massively oversimplified)

Coarse rule-of-thumb: running on Geforce class GPUs you can get up to 5x, maaaybe 10x the performance per dollar as compared to a top-line CPU. Assuming your problem scales well on GPUs, many problems don't. The GTX980 is actually a great performer. For Tesla class systems like the K40 it's a lot closer to equal with the CPU on performance/$ (they're not much faster than the GTX980 but a lot more expensive). But you can get an edge with the Teslas when you start comparing multi-GPU clusters to multi-CPU clusters, since with GPUs you need less of the super-expensive interconnect hardware. (You're not going to put GTX cards in a cluster, you'd have massive reliability problems.)

IMHO, the guys showing 100x speedups on GPUs are Doing It Wrong; they use a poor implementation on the CPU, use just one CPU core, consider a very synthetic benchmark, or a bunch of other tricks.