Hacker News new | ask | show | jobs
by andrewmunn 5309 days ago
Hello, I'm the author of this article. I do not have a degree in electrical or computer engineering. I'm merely stating the trends I've seen in the PC industry over the last few years.

I am, however, a Software Engineer. I know that most of the perceived lag on a modern desktop is not due to the CPU, but inefficient I/O to the hard disk or network. One must only look at the iPad 2 too see that's very possible to make a fast computer with beautiful 60 FPS animations and snappy applications using only an 800 MHZ dual core ARM CPU. Ironically, my iPad feels way faster than my Macbook Pro most of the time.

You don't need to be an expert in the microprocessor industry to know that the CPU performance race is over. It's all about power consumption now, and X86 fails miserably at lower power computing. Unless you know something I don't.

3 comments

I know that most of the perceived lag on a modern desktop is not due to the CPU, but inefficient I/O to the hard disk or network

Most perceived lag on a modern desktop comes from excessive abstraction which results in poor coding practices. You could certainly argue that IO bottlenecks or a lack of system resources will certainly have an impact but that impact wont be realized until the environment is somewhat saturated. A simple solution to the hard drive bottleneck is to throw a SATA3 SSD in there instead, or to give a system more RAM to boost disk caching, problem solved. On the other hand, no amount of system resources will alleviate a performance hit caused by shoddy coding. This is the reason that I refuse to use Google docs, the performance is about as good as Wordperfect on Windows 95 because of all the abstraction insanity.

One must only look at the iPad 2 too see that's very possible to make a fast computer with beautiful 60 FPS animations and snappy applications using only an 800 MHZ dual core ARM CPU.

The iPad 2 is about as powerful as my Pentium 4 was back in the early 2000s. Shrinking it down to that level is certainly an accomplishment but it's not worth the shock and awe that you present it to be. It's nice to have a device such as the iPad 2 to fill the time when you wish you had a computer but it is in no way a full desktop substitute.

Ironically, my iPad feels way faster than my Macbook Pro most of the time.

Your MacBook is a fundamentally different device than your iPad. They may feel similar but this is purely superficial, the underlying operations are vastly different. If your MacBook is that sluggish, it's either because you're using an Apple product or you've got a PEBKAC error.

You don't need to be an expert in the microprocessor industry to know that the CPU performance race is over

Yes you do. The CPU performance race has been over for the past 5 years but not for the reason you think it is. The CPU performance race is over because AMD choked and threw in the towel. In 2007 AMD's flagship Phenom processor was bested by Intel's then worst in class Core2Quad Q6600 in almost benchmark (if not every benchmark). In 2011 AMD's flagship octal core Bulldozer processor was beaten by a Intel's worst in class quad core i7 920 from 2 years ago which also had an added handicap of only having 2 of its 3 memory channels loaded with DIMMs. Don't blame AMD's failures on the market, or Intel, blame them on AMD.

The fact that the CPU performance race is over doesn't mean that Intel has won, it merely means that Intel is the only competitor since AMD is effectively now a non-contender. It also doesn't mean that there is room in the desktop market for ARM CPUs, or that desktop hardware manufacturers are suddenly going to start writing drivers for two completely different architectures.

While it is certainly true that ARM is gaining on Intel in the performance space, it is still a long long way behind and that gap is only going to get harder and harder to close as time goes on. This is going to be doubly difficult when ARM manufacturers try to catch up to Intel in the general purpose execution department. It's easy enough to say that ARM has a lead in performance per watt if you ignore all of the special hardware capabilities that Intel CPUs have which are mostly absent on ARM or if you forget that power consumption scales logarithmically with voltage and that voltage is necessary to maintain a higher frequency.

It's all about power consumption now, and X86 fails miserably at lower power computing. Unless you know something I don't.

I do know something you don't. Architectures aren't designed to scale infinitely in both directions on the power scale yet Intel still manages to operate dual core full featured processors in the 17 watt range that will still destroy any dual or quad core ARM processor that gets put up against it. Also, I'm not sure how you can justify your statement "it's all about power consumption" because for 95% of the desktop market heat is a non issue whereas a lack of performance certainly is. If you live in a datacenter the constant whine of fans and AC units can certainly get annoying but as I mentioned above, there are already low power solutions that can be had without reinventing the wheel.

It strikes me funny how AMD couldn't compete in a market with Intel and maybe Via, but somehow they think they can compete in a market with 3+ strong competitors.

This feels like throwing the baby out with the bathwater.

Most perceived lag on a modern desktop comes from excessive abstraction which results in poor coding practices.

This is worthless without actual numbers, which I doubt you have. Hardware people blame software, software people blame hardware, as it has always been, so mote it be, amen.

Here though it is not about blaming software or hardware people.

Here is what John Carmack talks about his troubles with the lack of PC performance due to the multitude of APIs to reach the hardware:

John Carmack: ... That's really been driven home by this past project by working at a very low level of the hardware on consoles and comparing that to these PCs that are true orders of magnitude more powerful than the PS3 or something, but struggle in many cases to keep up the same minimum latency. They have tons of bandwidth, they can render at many more multi-samples, multiple megapixels per screen, but to be able to go through the cycle and get feedback... “fence here, update this here, and draw them there...” it struggles to get that done in 16ms, and that is frustrating.

Later in the article John expands on the thick software problem.

The article is here: http://pcper.com/reviews/Editorial/John-Carmack-Interview-GP...

That quote is a bit out of context. The paragraph starts "I don't worry about the GPU hardware at all. I worry about the drivers a lot...". He's talking specifically about GPU performance.
If you want numbers, try comparing the stack depth in a modern application's event handler to those from 10 years ago. Qt4 alone, for example, routinely approaches 50 calls deep just to update a canvas in response to a mouse event. Add to that a dozen or more layers between the compositing manager, window manager, X, display driver, and the kernel, and the end-to-end latency climbs through the roof.
I hope that the end of higher Ghz processors will make it viable again to optimize for code performance instead of optimized for developer time.
Indirectly related: the size of current systems: a typical desktop system is written in about 200 millions lines of code (about 10K books, or a library). http://vpri.org/ (co-founded by Alan Kay) is trying to make a roughly equivalent system in 20K LOCs, or about one single book. And it looks like they can do it (5 years in the project, 1 more year to go).

Let's say it is possible. That would mean current systems are about ten thousands times bigger than they could be. That's 4 orders of magnitude. And even if it isn't 4 full orders of magnitude, I'm willing to bet on 3.

It is not yet about raw speed, or latency. But when a system is at least 3 orders of magnitudes bigger than it could be, it does mean that something there vastly suboptimal. And runtime performance could very well be part of that "something".

Yes, but is that 20K LOC system equivalent in functionality to the larger systems? In every respect, and not just the ones you happen to care about?
Just the ones they happen to care about. I don't think it matters such a great deal however: people tend to care about the same things. Feature creep is when you want to fully satisfy everyone, a few people at the time. Plus, if you want your missing feature, you can code it. I mean, you really can. Many components of that system don't spend more than 1K LOC, they really are accessible.

But that's kind of a straw man. Even if you convince me that feature creep really is valuable, lack of features explains but 1 order of magnitude out of 4. There's still 3 to go. I have two explanations for those.

First, they reuse their code. A lot. When they write a compiler, all phases (parsing, AST to intermediate language, optimizations, code generation) are done with the same tool (augmented Parsing Expression Grammars, search for the OMeta language for more details). When they draw something on the screen, be it a window frame, a drawing, or text, they again use a single piece of code. Mere factorization goes a long way. Id' say it explains about 1 order of magnitude as well.

Second, their use of specialized languages yield astonishing results: they can build a self-implementing compilation system in about 1000 lines (including a bunch of optimizations). 200 more lines gets you a reasonably efficient implementation of Javascript, 200 more gets you Prolog, and a couple hundreds more can get you about any DSL you may want (external DSLs, not your average Ruby/Haskell combinator library). They implemented an equivalent of Cairo in 457 lines, which is about 100 times smaller (and quite efficient to boot, but that was a surprise bonus). They did a TCP-IP stack in about 160 lines, which again is about 100 times smaller than a typical C implementation. And they did all that with specialized languages that themselves are implemented in very little code. Based on that, I'd say their use of domain specific languages explains about 2 orders of magnitude. (Don't take my word for it. See their last progress report here: http://www.vpri.org/pdf/tr2011004_steps11.pdf )

To sum up, we could argue that current systems are about 4 orders of magnitude too big. Of the 4, 1 may be debatable (lots of features). Another (not reusing and factorizing code) is obviously something that has Gone Wrong™ (I mean, it could have been avoided if we cared about it). The remaining 2 (DSLs) are a Silver Bullet. Not enough to kill the Complexity Werewolf, but it sure makes it much less frightening. By the way, we should note that the idea of DSLs is around for quite some time. Not using them so far may count as something that has Gone Wrong as well, though I'm not sure.

X86 fails miserably at lower power computing

x86 currently doesn't scale down to the level required for smartphones.

However, it is getting close in the tablet space. Estimates for the Tegra 3 are around 3-4W TDP[1], while the Cedar Trail Atoms are around 5.5W TDP. In early 2012 Intel will release their Medfield Atom chips, which will make the competition even more interesting.

[1] http://semiaccurate.com/forums/showthread.php?t=4169

[2] http://www.extremetech.com/computing/94184-early-cedar-trail...

>"Unless you know something I don't."

Based upon history, reports of the x86's death have been greatly exaggerated - since the late 1980's.

Here's a nice 1999 article from Ars: [http://arstechnica.com/cpu/4q99/risc-cisc/rvc-1.html]

and the archive.org version for those without IE4 or Netscape Navigator: [http://web.archive.org/web/19991129051550/http://arstechnica...]

Floating Point operations are an example of the hurdle faced by the RISC processors such as ARM - RISC ideology suggests that dedicated FPU hardware and instructions should not be used despite the performance hit that software implementations incur.

On the other hand, the x86 CISC approach has allowed for increased integration based on changing market demands over the past 20 years (e.g. FPU integration with the 80486 in 1989 and MMX in 1996 on the Pentium).

That sort of flexibility has advantages.

I just want to point out that the RISC/CISC term is anachronistic, it doesn't really apply anymore to the desktop and server world. Intel's x86 processors are RISC micro-instructions but with a CISC-like interface for example, effectively blending both. It's what allowed them to race ahead of all competitors in the first place.

edit: My excuses, I couldn't access the cited arstechnica article.

The linked Ars Technica article in fact agrees with you, but retains the term to describe competing design philosophies.