Hacker News new | ask | show | jobs
by blueblimp 972 days ago
I question the article's framing of CPUs as "universal" and GPUs as "specialized". In theory, they can both do any computation, so they differ only in their performance characteristics, and the deep learning revolution has shown that there is wide range of practical workloads that is non-viable on CPUs. The reason OpenAI runs GPT-4 on GPUs isn't that it's faster than running it on CPUs--they do it because they _can't_ practically run GPT-4 on CPUs.

So what's going on is not a shift away from the universality of CPUs, but a realization that CPUs weren't as universal as we thought. It would be nice though if a single processor could achieve the best of both worlds.

5 comments

> [...] differ only in their performance characteristics [...]

... but that is exactly why CPUs are considered "universal" and GPUs as "specialized".

The whole concept of specialized hardware is to do fewer things more efficient, and in tons of applications that means the problem suddenly becomes feasible. That has always been the case. Not sure what the deep learning revolution has shown in regards to this.

> In theory, they can both do any computation, so they differ only in their performance characteristics

But surely, the exact same thing can be said when comparing any two different machines that engage in computations and can do conditional branching.

GPUs can be used for any computational task that a general CPU can be used for, but a GPU is optimized so it will do certain sorts of tasks much better (and as a consequence will be worse at other kinds of tasks). CPUs are meant to be adequate (if not spectacular) at any sort of task.

It seems to me characterizing GPUs as "specialized" and CPUs as "not specialized" is entirely correct.

Indeed. Though I did like the rest of the article, three of the authors' pillars to define specialization are dubious:

> 1. substantial numbers of calculations can be parallelized

> 2. the computations to be done are stable and arrive at regular intervals ('regularity')

> 3. relatively few memory accesses are needed for a given amount of computation ('locality')

Where (1) fails, any modern multicore + SIMD + ILP desktop/console/mobile CPU will run at a tiny fraction of its peak throughput. While sufficiently small serial tasks still complete in "good enough" time, the same could be said of running serial programs on GPU (in fact this is sometimes required in GPU programming). People routinely (and happily) use PL implementations which are ~100x slower than C. The acceptibility of ludicrous under-utilization factors depends on the tininess of your workload and amount of time to kill. Parallelism is used broadly for performance; it's about as un-specialized as you can get!

(2) and (3) are really extensions of (1), but both remain major issues for serial implementations too. There mostly aren't serial or parallel applications, rather it's a factor in algorithm selection and optimization. Almost anything can be made parallel. Naturally you specialize HW to extract high performance, which requires parallelism, for specialized HW as for everywhere else.

The authors somewhat gesture towards the faults of their definition of "specialized" later on. Truly specialized HW trades much (or all) programmability in favor of performance, a metric which excludes GPUs from the last ~15 years:

> [The] specialization with GPUs [still] benefited a broad range of applications... We also expect significant usage from those who were not the original designer of the specialized processor, but who re-design their algorithm to take advantage of new hardware, as deep learning users did with GPUs.

You can't run a GPU without a CPU, but you can run a CPU without a GPU.

Could we change GPU's so they become more general purpose? Yeah, an we already have by gluing the GPU to a CPU ie integrated graphics, lots of CPUs has that. But when you do that we call it a CPU and not a GPU, so as soon as you make a GPU general purpose we start calling it a CPU.

I would also question the framing because the systems the specialized hardware is running are the most general software systems we've ever created