| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dfox 5042 days ago
	GPUs are primarily designed to be good for graphics, which implies completely different internal architecture. While GPUs have some graphics-oriented functional units the main factor is that all these cores have to access pretty large chunk of shared memory (textures, frame buffer...) and do that uniformly fast (and also support some weird addressing modes and access patterns). I suspect, that large part of die area of modern GPU is interconnect and that there really are few very wide cores (something like VLIW+SIMD+ possibly UltraSparcIV style hyperthreading, but that can be faked by compiler given sufficiently large register set) that are made to look like large amount of simple cores by magic in compiler (which seems consistent with CUDA programming model). So: you can get large amounts of performance with simple architecture, but only for some problems, with graphics not being in set of these problems.

1 comments

DeepDuh 5042 days ago

Sorry, but I have to correct a little bit here. Today's GPUs are

- not simple SIMD. NVIDIA calls it SIMT (single instruction multiple thread), mostly since you can branch a subset of them, so for the programmer it does feel somewhat like threads.

- not just optimized for Graphics anymore. E.g. since Fermi, the Tesla cards have DP performance = 50% of SP - which has been specifically introduced for HPC purposes. They have also constantly improved the schedulers to go more into general purpose computing, e.g. Kepler 2 seems to support arbitrary call graphs on the device. Again, that's useless for graphics.

- suitable for pretty much all stencil computations. Even for heavily bandwidth bounded problems GPUs are generally ahead of CPUs since they have very high memory bandwidth. The performance estimate I use for my master thesis comes out at 5x for Fermi over six core Westmere Xeon for bandwidth bounded and 7.5x for computationally bounded problems.

HPC is all about performance per dollar, performance per watt - and (sadly) sometimes linpack results because some institution wants to be in the top of some arbitrary list. In all of these aspects GPUs come out ahead of x86, which has been very dominant since the 90ies. Which is why GPUs are now in 4 of the top 20 systems - each of those are hundreds of millions of dollars in investments. That wouldn't be done if they weren't suitable for most computational problems.

dfox 5042 days ago

My point is that GPUs have significantly different architecture from most of these "many cores on a chip" designs. Original reason for that was clearly that such architecture was necessary for graphics, coincidentally it works better for many interesting HPC workloads. It's clear that manufacturers are introducing technologies that are not required for graphics, but they cannot be expected to do modifications that will make their GPUs unusable for graphics.

And as for SIMD/SIMT, I mentioned SIMD mostly in relation to operations on short vectors done by one thread, which is mostly irrelevant to overall architecture of the core, as it can very well be implemented by pure combinational logic in one cycle given enough space. My mental model of how modern GPU core (physical, not logical) actually works is essentially some kind of simplistic RISC/VLIW design with large amounts of registers with compiler and or hardware interleaving instructions of multiple threads into one pipeline, which may or may not be how it actually works but it looks probable to me.

In my opinion most of chips like Epiphany IV or XMOS or whatever, in contrast to GPUs, are useful for only limited classes of workloads as they tend to be memory starved.