Hacker News new | ask | show | jobs
by stephenjudkins 5309 days ago
This is an largely vapid and meaningless prediction by someone who doesn't demonstrate anything but the most superficial knowledge of the microprocessor industry. Perhaps he knows something we don't, but as far as I can tell he's only extrapolating current market trends.

Obviously Intel (once led by Andy Grove, author of "Only the Paranoid Survive") is aware of the threat posed by ARM. If someone could explain how Intel will fail to meet the challenges of getting x86's performance-per-watt to match ARM's, and how this compares to the challenges ARM vendors face in order to get raw performance up to Intel's level, I would love to read it. However, this post offers little such insight.

5 comments

One thing that could work against them is their cost structure. Intel is used to throwing a thousand or so engineers on each design. Having so many designers means they can squeeze out every last MHz, but it also means they need big margins and large volumes to recoup their costs.

The other big technical issue is the end of Dennard scaling. For most of the last three decades, scaling CMOS processes bought you three things: more transistors, higher frequency and lower power. Things are different now. We can't really scale frequency any more because we've run into the power wall. We used to get lower power at the same frequency by scaling the supply voltage, but this also required us to scale the threshold voltage (a device parameter). Unfortunately we can't scale the threshold voltage willy-nilly like in the past because leakage power increases for lower threshold voltages and is now a significant contributor to total power. We still get more transistors per unit area, but it's not clear whether the economic costs of building up new fabs and switching to a new process are offset by the benefits of having more transistors to play with.

The bottomline is that it's not clear whether Intel's biggest competitive advantage, that of having a manufacturing process superior to everyone else, is still that much of an advantage.

PS. One thing I find truly amazing is that Dennard predicted that we'd run into all these problems back in his landmark paper in 1974!

One thing that could work against them is their cost structure. Intel is used to throwing a thousand or so engineers on each design.

Exactly. It's also not obvious that the mobile chip market will pay a premium for Intel-calibre fabs. If it won't, then the question becomes whether a TSMC-made Atom is better than a TSMC-made ARM.

It's also fairly common to have custom hardware added to SoCs. Is Intel prepared to open up their processes to that sort of thing?

Unfortunately we can't scale the threshold voltage willy-nilly like in the past because leakage power increases for lower threshold voltages and is now a significant contributor to total power.

This one cuts both ways though. With leakage dominating active power, within a given node, the fabrication process will be relatively more important than microarchitecture, which is a point in Intel's favour.

This one cuts both ways though. With leakage dominating active power, within a given node, the fabrication process will be relatively more important than microarchitecture, which is a point in Intel's favour.

This is kinda nitpicking, but I'm not sure leakage will ever dominate active power. We still have the ability to reduce leakage if we want, we just have to give up frequency for it. In the past we didn't have to play this trade-off but even now I don't think it ever makes sense to run your chip so fast that leakage is more than dynamic power.

I do agree that for any given node, Intel is still going to ahead of the rest. It'll be interesting to see how much this helps them.

Leakage power is actually very significant part of the total power usage [1] and one of the bigger reasons why Intel developed the tri-gate technology [2].

Active power is the one that's related to the frequency (P ~= CV^2f). Leakage power will "leak" even if the transistor is not switching.

1. http://www.eetimes.com/electronics-news/4215605/Leakage-powe... 2. http://realworldtech.com/page.cfm?ArticleID=RWT050511195446

Not sure what you mean by significant, but typical leakage power numbers are something like 15-30% of total power.

Maybe you're referring to some papers that used to come out a few years ago which suggested that leakage power will dominate total power. As I said above, this is unlikely to happen. It doesn't make sense to operate at a combination of supply voltage (Vdd) and threshold voltage (Vt) where leakage dominates total power. I think these papers misunderstood the fact that threshold voltage and hence leakage itself is a knob that the device manufacturing folks can control.

Active power is the one that's related to the frequency (P ~= CV^2f). Leakage power will "leak" even if the transistor is not switching.

If you're implying that leakage power doesn't affect frequency, you are wrong. Transistor speed depends on the gate overdrive which, for modern velocity-saturated devices is proportional to Vdd-Vt. Leakage power itself is proportional to exp(-Vt). There is a clear trade-off here between how fast you run your chip and how much it will leak.

The papers I've seen point to values larger than 15~30% - I've seen ~50% cited for geometries as large as 65nm, only to get worse as we go to even smaller feature sizes. [1]

Threshold voltage is not really an effective knob, unless you assume that the feature size to be a knob and go against Moore's law, or that brand new, once in 10-years process innovation is a knob that designers can pick out of a hat. I don't think anyone's clamoring for return to 130nm parts on a smartphone. At each new process node, you're going to lose out on the amount of control you'll have over Vth.

This is basically what Intel did with the tri-gate transistors which gives them longer lease on life until they bump against subthreshold leakage. TSMC is on their first generation high-k metal gates, and still a process node or two away before jumping over to the tri-gate party.

1. http://www.eetimes.com/design/eda-design/4211228/Overcoming-...

One rule of thumb I've seen writers on Anandtech express several times is that any given microarchitecture can cover at most about one order of magnitude for power consumption. This leads to laptop/desktop-oriented microarchitectures that can scale at most from about 13W to 130W by tweaking clock speeds and voltage. More recently, the high end has dropped down to at most about 95W for non-server chips, but it still means you have to go back to the drawing board before you have a CPU that can work in tablets and smaller devices.

So far, Intel has shown that they aren't very good at simultaneously developing two parallel product lines of CPUs. Their tick/tock strategy of alternating process shrinks and microarchitecture updates has been working great for years, but Atom has clearly been neglected. Prior to that, they had the P4 NetBurst architecture and the P6-based Pentium Ms on the market at the same time, but NetBurst hit a wall and the company lost a lot of ground to the Athlon 64 before they could come up with a high-performance successor to the Pentium M.

95W / 10 = 9.5W which is a problem but 1/4th the cores = 2.4W which is reasonable.
>lost a lot of ground to the Athlon 64 before they could come up with a high-performance successor to the Pentium M.

You realize that Intel's EMT-64 is effectively AMD-64, right? Intel, which loathed that cross-patent deal, is now reaping AMD's rewards. Core2 and later series processors are all using AMD's intellectual property (legally).

Intel is most certainly not using AMD's architecture, they are using AMD's instruction set, yes, but they internals have nothing to do with AMD's designs. The Core series was an evolution of the Pentium M which was based on the Pentium III architecture.
Yes. Intel has more than made up for the mess they were in circa 2003-2005, but the fact remains that AMD was able to truly embarrass Intel for quite some time, both by beating Intel to market with several new technologies, and by significantly eroding Intel's market share for desktop and server chips.
Seems like a harsh characterization. When I read articles like this from a person who is enthusiastic and smart but not widely experienced yet, I read it in the frame of mind that this is what the author wishes were true, rather than as something that actually is true. In this case Andrew, who is a student and has been interning at Google apparently, would really like the world to leave the "x86" behind and move on to something presumably more akin to what ever he happens to think should be a worthy successor.

That being said, in terms of CPU's being shipped that are 'customer facing' and programmable with applications from multiple third parties, ARM chips in 'smart' phones and tablets are taking up a bigger chunk of the pie than any previous instruction set architecture (ISA). That includes both PowerPC (Apple products) and Motorola's 68K architecture (Sun and Apple products).

However, what the Andrew misses out on completely is the distinction between systems and processors and the effect that has on adoption rate. This 'secret weapon' that guards the x86 ISA from death like the charm on Harry Potter's head, was put there by IBM in 1981.

In 1981 IBM shipped its first "Personal Computer" and because it was new to IBM to do that and they expected mostly hobbiests to buy them, the 'hardware information' manual came with schematics, a BIOS listing, and where all the various chips were addressed and how those chips would work. Then as its popularity soared, it was 'cloned' (and this is very important), right down to the register level and with identical BIOS code. The parts were available from non-IBM sources and there was really nothing preventing an engineer from doing it except the off chance that IBM would sue them for something.

As it turned out they did sue for copyright violation on the BIOS code but that was really all they could do, the schematic could be copyrighted but implementations of the schematic were not. Once someone had implemented a BIOS in a 'clean room' and that the BIOS was legitimate was sucessfully litigated, the door was opened and the 'PC' business was born. The key here however was that every single one of them was register and peripheral compatible.

Another event happened at this time which helped seal the charm. Microsoft started selling MS-DOS which was software compatible (which is to say had the same APIs) as PC-DOS but could run on hardware that was not register compatible. Intel made a high integration chip, the 80186, which you could think of as a ancestor of today's system-on-a-chip (SOC) ARM chips. It ran MS-DOS but because the registers and peripherals were slightly different (better engineering wise, but different) programs that ran on PCs would not run on it if they talked to say the interrupt controller or the keyboard processor. Thus the term 'well behaved' programs was born, and they were few and far between. And the other side was Microsoft Flight Simulator that, in order to get any sort of performance at all, talked almsot exclusively to the bare metal, became the barometer of 'clone' ness. The question "Can it run Flight Simulator?" was a buyer discriminator and if the answer was 'no' then sales were disappointing.

Those two events, cemented for almost two decades the definition of what it meant to be a 'PC'.

Into those decades billions of person-hours were invested in software and tools and programs and features. A meeting of Microsoft and Intel regularly got together with OEMs and chip makers and system builders to define all of the details, the same details that were originally from the PC Hardware Manual, that everyone would agree on constituted a "PC". These became known as the "PC-98" standard (for PC's built after 1998) or the "PC-2000" standard. Things like power supplies, keyboards, board form factors and slot configurations all became sub processes within that ecosystem and followed the lead of this over-arching standard. Obscure stuff like what the thread pitch would be on the screws that sealed the cabinet, not so obscure stuff like the dimensions of the 'cut out' for built in peripheral ports. And during all that time the basic registers, the boot sequence, what BIOS provided, and the set of things that could be counted on to exist so that you could boot to a point to discover the new stuff all remained constant.

ARM doesn't have any of that. ARM, as an ISA, is controlled by a company that doesn't build chips, doesn't sell systems using those chips, and is not affected by 'stupid' choices in their architecture. All of that is offloaded to the 'ARM licensees.' And since anyone can license and ARM chip, they do. And that means you have ARM chips in FPGAs and ARM chips from embedded processor manufacturers, and ARM chips from video graphics companies. They are all different. Worse, they all boot differently, they all have different capabilities, they don't talk to a standard graphics configuration, they don't have a standard I/O configuration, they don't have a place where USB ports are expected to appear, or a standard way of asking 'what device is booting me and can I ask it for data?' Quite simply there is no standard ARM system.

And because they don't have a standard system, there isn't any leverage. Its like running a race with lead shoes, possible but very tiring.

Now some folks, and Andrew here is clearly one of them, think the system problem is solved by 'Android.' They believe that because software developers can write to Android APIs and have their code run on all Android machines, that they are done. Except that getting Android to run on an ARM system is painful. And worse the 'high volume' Android systems have features at different places (where the accellerometer is, how the graphics work, can it do 2D accelleration or not?) There is not Android 'pc' which gets to define all the detail bits and thus free manufacturers from the grip of having to hire expensive software types to figure this out.

In the end I agree with Stephen's comment that "If someone could explain how Intel will fail to meet the challenges of getting x86's performance-per-watt to match ARM's...." is a red herring, since Intel has literally years of runway to do that, meanwhile ARM platforms are dying (Playbook anyone?) because the cost to make them pushes them out beyond what the market will bear (and yes the iPad/iPhone are keeping a lid on what you can charge for one of these things)).

This is insightful, but I think you've pointed out the solution (for ARM) at the same time as the problem. Is there any reason that Microsoft couldn't repeat their earlier work and define an ARMPC-2013 standard? It seems like this will be necessary if they want Windows8-on-ARM to be a useable proposition.
The trick is get a system standard in place, that has the tendency to commoditize the chips and Intel was in a position to control the value chain (price of the CPU is still a disproportinate cost of the overall system). So someone has to create the standard on faith that the increased volume will make up for the price pressure that comes with commoditization.

Now ARM could come up with a spec for the 'ARM System Standard' and license/certify that. That has some possibility if someone like Google made sure that the Android kernel always ran on the 'reference design' standard. But that level of strategic thinking has been very hard to co-ordinate to date.

It seems to me that Microsoft is in a perfect position to promulgate such a standard - they don't care if the hardware is commoditized (in fact they would welcome it). It also appears that "will run Windows 8" should be a sufficient carrot to convince manufacturers to build to the spec.

As you say, Google is in a similar position, so perhaps a Microsoft/Google jointly supported standard makes a certain amount of sense, as odd as that sounds...

Hello, I'm the author of this article. I do not have a degree in electrical or computer engineering. I'm merely stating the trends I've seen in the PC industry over the last few years.

I am, however, a Software Engineer. I know that most of the perceived lag on a modern desktop is not due to the CPU, but inefficient I/O to the hard disk or network. One must only look at the iPad 2 too see that's very possible to make a fast computer with beautiful 60 FPS animations and snappy applications using only an 800 MHZ dual core ARM CPU. Ironically, my iPad feels way faster than my Macbook Pro most of the time.

You don't need to be an expert in the microprocessor industry to know that the CPU performance race is over. It's all about power consumption now, and X86 fails miserably at lower power computing. Unless you know something I don't.

I know that most of the perceived lag on a modern desktop is not due to the CPU, but inefficient I/O to the hard disk or network

Most perceived lag on a modern desktop comes from excessive abstraction which results in poor coding practices. You could certainly argue that IO bottlenecks or a lack of system resources will certainly have an impact but that impact wont be realized until the environment is somewhat saturated. A simple solution to the hard drive bottleneck is to throw a SATA3 SSD in there instead, or to give a system more RAM to boost disk caching, problem solved. On the other hand, no amount of system resources will alleviate a performance hit caused by shoddy coding. This is the reason that I refuse to use Google docs, the performance is about as good as Wordperfect on Windows 95 because of all the abstraction insanity.

One must only look at the iPad 2 too see that's very possible to make a fast computer with beautiful 60 FPS animations and snappy applications using only an 800 MHZ dual core ARM CPU.

The iPad 2 is about as powerful as my Pentium 4 was back in the early 2000s. Shrinking it down to that level is certainly an accomplishment but it's not worth the shock and awe that you present it to be. It's nice to have a device such as the iPad 2 to fill the time when you wish you had a computer but it is in no way a full desktop substitute.

Ironically, my iPad feels way faster than my Macbook Pro most of the time.

Your MacBook is a fundamentally different device than your iPad. They may feel similar but this is purely superficial, the underlying operations are vastly different. If your MacBook is that sluggish, it's either because you're using an Apple product or you've got a PEBKAC error.

You don't need to be an expert in the microprocessor industry to know that the CPU performance race is over

Yes you do. The CPU performance race has been over for the past 5 years but not for the reason you think it is. The CPU performance race is over because AMD choked and threw in the towel. In 2007 AMD's flagship Phenom processor was bested by Intel's then worst in class Core2Quad Q6600 in almost benchmark (if not every benchmark). In 2011 AMD's flagship octal core Bulldozer processor was beaten by a Intel's worst in class quad core i7 920 from 2 years ago which also had an added handicap of only having 2 of its 3 memory channels loaded with DIMMs. Don't blame AMD's failures on the market, or Intel, blame them on AMD.

The fact that the CPU performance race is over doesn't mean that Intel has won, it merely means that Intel is the only competitor since AMD is effectively now a non-contender. It also doesn't mean that there is room in the desktop market for ARM CPUs, or that desktop hardware manufacturers are suddenly going to start writing drivers for two completely different architectures.

While it is certainly true that ARM is gaining on Intel in the performance space, it is still a long long way behind and that gap is only going to get harder and harder to close as time goes on. This is going to be doubly difficult when ARM manufacturers try to catch up to Intel in the general purpose execution department. It's easy enough to say that ARM has a lead in performance per watt if you ignore all of the special hardware capabilities that Intel CPUs have which are mostly absent on ARM or if you forget that power consumption scales logarithmically with voltage and that voltage is necessary to maintain a higher frequency.

It's all about power consumption now, and X86 fails miserably at lower power computing. Unless you know something I don't.

I do know something you don't. Architectures aren't designed to scale infinitely in both directions on the power scale yet Intel still manages to operate dual core full featured processors in the 17 watt range that will still destroy any dual or quad core ARM processor that gets put up against it. Also, I'm not sure how you can justify your statement "it's all about power consumption" because for 95% of the desktop market heat is a non issue whereas a lack of performance certainly is. If you live in a datacenter the constant whine of fans and AC units can certainly get annoying but as I mentioned above, there are already low power solutions that can be had without reinventing the wheel.

It strikes me funny how AMD couldn't compete in a market with Intel and maybe Via, but somehow they think they can compete in a market with 3+ strong competitors.

This feels like throwing the baby out with the bathwater.

Most perceived lag on a modern desktop comes from excessive abstraction which results in poor coding practices.

This is worthless without actual numbers, which I doubt you have. Hardware people blame software, software people blame hardware, as it has always been, so mote it be, amen.

Here though it is not about blaming software or hardware people.

Here is what John Carmack talks about his troubles with the lack of PC performance due to the multitude of APIs to reach the hardware:

John Carmack: ... That's really been driven home by this past project by working at a very low level of the hardware on consoles and comparing that to these PCs that are true orders of magnitude more powerful than the PS3 or something, but struggle in many cases to keep up the same minimum latency. They have tons of bandwidth, they can render at many more multi-samples, multiple megapixels per screen, but to be able to go through the cycle and get feedback... “fence here, update this here, and draw them there...” it struggles to get that done in 16ms, and that is frustrating.

Later in the article John expands on the thick software problem.

The article is here: http://pcper.com/reviews/Editorial/John-Carmack-Interview-GP...

That quote is a bit out of context. The paragraph starts "I don't worry about the GPU hardware at all. I worry about the drivers a lot...". He's talking specifically about GPU performance.
If you want numbers, try comparing the stack depth in a modern application's event handler to those from 10 years ago. Qt4 alone, for example, routinely approaches 50 calls deep just to update a canvas in response to a mouse event. Add to that a dozen or more layers between the compositing manager, window manager, X, display driver, and the kernel, and the end-to-end latency climbs through the roof.
I hope that the end of higher Ghz processors will make it viable again to optimize for code performance instead of optimized for developer time.
Indirectly related: the size of current systems: a typical desktop system is written in about 200 millions lines of code (about 10K books, or a library). http://vpri.org/ (co-founded by Alan Kay) is trying to make a roughly equivalent system in 20K LOCs, or about one single book. And it looks like they can do it (5 years in the project, 1 more year to go).

Let's say it is possible. That would mean current systems are about ten thousands times bigger than they could be. That's 4 orders of magnitude. And even if it isn't 4 full orders of magnitude, I'm willing to bet on 3.

It is not yet about raw speed, or latency. But when a system is at least 3 orders of magnitudes bigger than it could be, it does mean that something there vastly suboptimal. And runtime performance could very well be part of that "something".

Yes, but is that 20K LOC system equivalent in functionality to the larger systems? In every respect, and not just the ones you happen to care about?
Just the ones they happen to care about. I don't think it matters such a great deal however: people tend to care about the same things. Feature creep is when you want to fully satisfy everyone, a few people at the time. Plus, if you want your missing feature, you can code it. I mean, you really can. Many components of that system don't spend more than 1K LOC, they really are accessible.

But that's kind of a straw man. Even if you convince me that feature creep really is valuable, lack of features explains but 1 order of magnitude out of 4. There's still 3 to go. I have two explanations for those.

First, they reuse their code. A lot. When they write a compiler, all phases (parsing, AST to intermediate language, optimizations, code generation) are done with the same tool (augmented Parsing Expression Grammars, search for the OMeta language for more details). When they draw something on the screen, be it a window frame, a drawing, or text, they again use a single piece of code. Mere factorization goes a long way. Id' say it explains about 1 order of magnitude as well.

Second, their use of specialized languages yield astonishing results: they can build a self-implementing compilation system in about 1000 lines (including a bunch of optimizations). 200 more lines gets you a reasonably efficient implementation of Javascript, 200 more gets you Prolog, and a couple hundreds more can get you about any DSL you may want (external DSLs, not your average Ruby/Haskell combinator library). They implemented an equivalent of Cairo in 457 lines, which is about 100 times smaller (and quite efficient to boot, but that was a surprise bonus). They did a TCP-IP stack in about 160 lines, which again is about 100 times smaller than a typical C implementation. And they did all that with specialized languages that themselves are implemented in very little code. Based on that, I'd say their use of domain specific languages explains about 2 orders of magnitude. (Don't take my word for it. See their last progress report here: http://www.vpri.org/pdf/tr2011004_steps11.pdf )

To sum up, we could argue that current systems are about 4 orders of magnitude too big. Of the 4, 1 may be debatable (lots of features). Another (not reusing and factorizing code) is obviously something that has Gone Wrong™ (I mean, it could have been avoided if we cared about it). The remaining 2 (DSLs) are a Silver Bullet. Not enough to kill the Complexity Werewolf, but it sure makes it much less frightening. By the way, we should note that the idea of DSLs is around for quite some time. Not using them so far may count as something that has Gone Wrong as well, though I'm not sure.

X86 fails miserably at lower power computing

x86 currently doesn't scale down to the level required for smartphones.

However, it is getting close in the tablet space. Estimates for the Tegra 3 are around 3-4W TDP[1], while the Cedar Trail Atoms are around 5.5W TDP. In early 2012 Intel will release their Medfield Atom chips, which will make the competition even more interesting.

[1] http://semiaccurate.com/forums/showthread.php?t=4169

[2] http://www.extremetech.com/computing/94184-early-cedar-trail...

>"Unless you know something I don't."

Based upon history, reports of the x86's death have been greatly exaggerated - since the late 1980's.

Here's a nice 1999 article from Ars: [http://arstechnica.com/cpu/4q99/risc-cisc/rvc-1.html]

and the archive.org version for those without IE4 or Netscape Navigator: [http://web.archive.org/web/19991129051550/http://arstechnica...]

Floating Point operations are an example of the hurdle faced by the RISC processors such as ARM - RISC ideology suggests that dedicated FPU hardware and instructions should not be used despite the performance hit that software implementations incur.

On the other hand, the x86 CISC approach has allowed for increased integration based on changing market demands over the past 20 years (e.g. FPU integration with the 80486 in 1989 and MMX in 1996 on the Pentium).

That sort of flexibility has advantages.

I just want to point out that the RISC/CISC term is anachronistic, it doesn't really apply anymore to the desktop and server world. Intel's x86 processors are RISC micro-instructions but with a CISC-like interface for example, effectively blending both. It's what allowed them to race ahead of all competitors in the first place.

edit: My excuses, I couldn't access the cited arstechnica article.

The linked Ars Technica article in fact agrees with you, but retains the term to describe competing design philosophies.
Also the beginning of "good enough" performance? In recent years client side software has not generated any new breakthroughs that require vast amount of computing power. Most opengl v2 + games look more or less the same, raytracing hasnt taken off inspite of intels best efforts and there are multiple solutions to do video decode in hardware. I think intel sorely needs something to come along that causes regular users to want to pay top dollar for top performance.
The only thing I run with any regularity, that feels like it could use more raw horsepower is the Clang/LLVM static-analyzer.

Adding an SSD seemed to make no difference, but with luck, the software will get some love and speed up.