| Indeed. Though I did like the rest of the article, three of the authors' pillars to define specialization are dubious: > 1. substantial numbers of calculations can be parallelized > 2. the computations to be done are stable and arrive at regular intervals ('regularity') > 3. relatively few memory accesses are needed for a given amount of computation ('locality') Where (1) fails, any modern multicore + SIMD + ILP desktop/console/mobile CPU will run at a tiny fraction of its peak throughput. While sufficiently small serial tasks still complete in "good enough" time, the same could be said of running serial programs on GPU (in fact this is sometimes required in GPU programming). People routinely (and happily) use PL implementations which are ~100x slower than C. The acceptibility of ludicrous under-utilization factors depends on the tininess of your workload and amount of time to kill. Parallelism is used broadly for performance; it's about as un-specialized as you can get! (2) and (3) are really extensions of (1), but both remain major issues for serial implementations too. There mostly aren't serial or parallel applications, rather it's a factor in algorithm selection and optimization. Almost anything can be made parallel. Naturally you specialize HW to extract high performance, which requires parallelism, for specialized HW as for everywhere else. The authors somewhat gesture towards the faults of their definition of "specialized" later on. Truly specialized HW trades much (or all) programmability in favor of performance, a metric which excludes GPUs from the last ~15 years: > [The] specialization with GPUs [still] benefited a broad range of applications... We also expect significant usage from those who were not the original designer of the specialized processor, but who re-design their algorithm to take advantage of new hardware, as deep learning users did with GPUs. |