Hacker News new | ask | show | jobs
by speeder 4862 days ago
I wonder if we will ever figure a way to resume improving clock cycles instead of adding more parallelism.

Parallelism has two major issues:

First, not all applications need it, in many cases you want to do just a series of operations in a single starting number, and you don't need anything else, like if you are for example calculating a factorial, if you need only one factorial, it is useless to make it more parallel.

Second, it is absurdly hard to code stuff for heavily parallelised hardware, most coders will make crap code that don't work, no matter how good we become in making helper libraries, it is totally another way of thinking.

Yes, for some things, like servers, where you can throw a user into each core, it is nice... But for many other uses, even simple single-core parallelism, like SIMD, is not much useful.

6 comments

I wonder if we will ever figure a way to resume improving clock cycles instead of adding more parallelism.

IPC has been steadily improving generation over generation, but it is a slow march. Chip frequency does not seem like it is going to go anywhere without some serious breakthroughs; processors are thermally limited, and while you can work on saving power there don't seem to be any 10x improvements in power coming that could let you crank up the clock. Scaling voltage down is great for reducing switching power, but it depends on smaller and smaller transistors, so leakage power has been steadily growing and eating into those gains.

Chips are up against a lot of walls- power consumption, heat dissipation, and so on. Chip makers have and are working on pushing forwards, but short of a new kind of transistor there do not appear to be any improvements by orders of magnitude on the horizon for single-core performance.

This is why parallelism is important. It is hard, and not every workload can be parallelised well, but there is simply no other known way to secure a 4x, 8x, 16x, etc boost in performance than 4x, 8x, 16x, etc parallelism. (Assuming your code isn't terrible, in which case fix your code!)

For increasing frequency the problem is we used to be able to just bump the clock speeds when the circuits shrank, but we've gotten small enough that the transistors start to leak now, so we have to drive them with less voltage to prevent the chip from overheating, meaning the clock has to come back down </huge-oversimplification>.

There are some possible ways forward. If we're sticking with transistors we might be able to switch to a material with better electrical properties, but that needs lots of research before it'll be higher performance than silicon.

There are also funky non-transistor based ideas for doing computation, like using DNA or nano-scale clockwork or ballistic electrons. I have no idea how feasible that stuff is.

In the shorter term, computer engineers are finding ways to turn those extra transistors into better single threaded performance by better prefetching, branch prediction, and re-ordering your instructions so that more than one are executed at a time even though you never thought about parallelism when writing it. That's why a modern computer core is much faster than an old Pentium 4, even though the clock speeds might be the same.

The problem with that is that using more transistors tends to provider at most a O(sqrt(n)) speedup, whereas adding more cores potentially provides an O(n) speedup.

> First, not all applications need it, in many cases you want to do just a series of operations in a single starting number, and you don't need anything else, like if you are for example calculating a factorial, if you need only one factorial, it is useless to make it more parallel.

Sorry for nitpicking, but calculating factorial can certainly be parallelized. Easiest way to do this is multiply every n-th number on each core and then multiply n results together.

I doubt that you can have a much increase in performance as cores increase unless you are calculating numbers with huge amount of bits.
Considering that factorial of 1e6 has about 18e6 bits (and factorial of 1e3 has 8.5e3 bits)? Yes, any factorial that doesn't have a huge amount of bits will be fast enough to calculate that there's not much point to parallelizing it.
as the size of the input to the procedure increases you will indeed be calculating with numbers with a huge amount of bits
Well what we really need to continue to improve clock cycles is better cooling. Overclocking modern chips is really easy and all you really need to get a pretty solid increase in speed is a decent cpu cooler. Yes to get drastic increases you need to up voltage to the chip and there are concerns about the chip degrading faster at higher clock cycles, but for the most part a solid increase in speed can be achieved simply by telling it to go faster and making sure it doesn't overheat.
Sure parallel code is harder than sequential code, but it's not really all that much harder (maybe about 8 credit-hours at your local university?). The reason it's "too hard" is that most programs aren't slow enough to be worth the bother.
If you're interested in faster clock cycles, check out the specs on this beast: http://www-03.ibm.com/systems/z/hardware/zenterprise/zec12.h...
> If you're interested in faster clock cycles, check out the specs on this beast: http://www-03.ibm.com/systems/z/hardware/zenterprise/zec12.h....

You can overclock an old Core 2 higher than that, with a bit of luck and good gear. And it'll undoubtedly be cheaper than buying an IBM 'frame.

FWIW overclocking records are currently above 8GHz on Vishera using N2 cooling. Granted N2 cooling can't actually be used, but that gives you the limits of the chips. You can reach 5.5GHz on water, and 6+ on cascade or single stage (which do work as standard cooling solutions)