Hacker News new | ask | show | jobs
by hajile 1490 days ago
This doesn't seem to be the best-researched article out there.

If they thought Itanium was bad, they should have looked into the i860. Itanium was an attempt to fix a bunch of the i860 ideas. i860 quickly went from a supercomputer chip to a cheap DSP alternative (where it had at least the hope of hitting more than 10% of its theoretical performance).

Intel iAPX 432 was preached as the second coming back in the 80s, but failed spectacularly. The i960 was take 2 and their joint venture called BiiN also shuttered. Maybe Rekursiv would be worthy of a mention here too.

We now know that core 2 dropped all kinds of safety features resulting in the Meltdown vulnerabilities. It also partially explains why AMD couldn't keep up as these shortcuts gave a big advantage (though security papers at the time predicted that meltdown-style attacks existed due to the changes).

Rather than an "honorable mention", the Cell processor should have easily topped the list of designs they mentioned. It was terrible in the PS3 (with few games if any able to make full use of it) and it was terrible in the couple supercomputers that got stuck with it.

I'd also note that Bulldozer is also maligned more than it should be. There's a lot to like about the concept of CMT and for the price, they weren't the worst. I'd even go so far as to say that if AMD wasn't so starved for R&D money during that period, they may have been able to make it work. ARM's latest A510 shares more than a few similarities. A big/little or big/little/little CMT architecture seems like a very interesting approach to explore in the future.

13 comments

I was also surprised the iAPX 432 wasn't on the list. It seems to be the Itanium's grandaddy. It was expensive, targeted to enterprises rather than everyone, tried to push the boundaries (32-bit for the 432, 64-bit for Itanium), and relied on VLIW instruction sets that were beyond the capabilities of compilers. The resemblance is striking.

As for Bulldozer, I was saddled with one for a while. Where it really fell down was (surprise!) its floating point performance. That FPU shared between two integer units makes for some "interesting" performance characteristics when trying to run multiple FP-heavy tasks, but overall, it was merely mediocre rather than terrible. I'm glad AMD hit it out of the park with Zen.

Bulldozer gets too much hate IMO. Okay, the instructions per clock cycle were bad and power consumption was high but you can't forget that the FX-6300 was $100 for a >3-core chip that could be overclocked by another 0.7 GHz without issue. The price-performance ratio was better than anything Intel fielded. I'm still running it today.
Bulldozer has got a lot of hate mostly because of false advertising and because of a series of blog articles written by AMD marketing people before its launch in 2011, which created very wrong expectations about its characteristics.

The wrong expectations and false advertising have centered on the fact that the first Bulldozer was described as an 8-core CPU, which would easily crush its 4-core competition from Intel (Sandy Bridge).

What the AMD bloggers have forgotten to mention was that the new Bulldozer cores were much weaker than the cores of their previous CPU generations, being able to execute only 2 instructions per cycle, while an Intel core could execute 4 instructions per cycle (and the previous AMD cores could execute 3 instructions per cycle). So a Bulldozer core only had the performance of a single thread of the 2 threads of an Intel core, for multi-threaded tasks, with the additional disadvantage that the resources of 2 AMD cores could not be allocated to a single thread when the second core of a module was idle.

So an 8-core Bulldozer could barely match the multi-threaded performance of a 4-core Sandy Bridge, while being much slower on single-thread tasks.

If one would have known since the beginning that the Bulldozer cores had been intentionally designed to be much weaker than the old AMD cores and than the Intel cores, this would not have been a surprise and everybody for whom the price/performance ratio was more important than the performance would have been happy to buy Bulldozer CPUs.

However, after many months during which AMD claimed that their supposedly 8-core CPU will be better than any other CPU with less cores, there was a huge disappointment caused by the first tests after launch, which immediately revealed the pathetic performance of the new cores, which for single-thread tasks were much slower than the previous AMD CPUs.

So all the hate has been caused by the stupid actions of the AMD management and marketing, who lied continuously about Bulldozer, even if they should have thought that this is useless, because the independent benchmarks will reveal the truth immediately after launch.

To set correctly the expectations about Bulldozer vs. Sandy Bridge, what AMD called a 4-module 8-core CPU should have been called a 4-core 8-thread CPU, but which has dynamic allocation inside a core (module in AMD jargon) only for the FPU, while the integer resources are allocated statically. With this correct description there would have been no surprise about the behavior of Bulldozer.

A part of the hate is also due to some engineering decisions whose reasons are a mystery even now, because if you would have queried randomly a thousand of logic design engineers before 2011, all or almost all would have said that they are bad decisions, so it is hard to understand how they could be promoted and approved inside the AMD design teams.

For example, since the Opteron launch in 2003 and until Intel launched Sandy Bridge in 2011, the largest advantage in performance of the AMD CPUs was in the computations with large numbers, because the AMD CPUs could do integer multiplications much faster than the Intel CPUs.

The Intel designers have recognized that this is a problem, and during the 2006-2011 interval they have decreased every year the number of clock cycles required for operations like multiplications and divisions, so that Penryn began to approach the AMD throughput per clock cycle, Nehalem & Westmere matched the AMD throughput, while Sandy Bridge achieved a double throughput in comparison with the old AMD CPUs.

While Intel worked diligently to improve the performance of their cores, what did AMD do ?

Someone at AMD has decided for an unknown reason that there is no need for Bulldozer to keep their existing computational performance, but it is enough to have integer multipliers with a throughput equal to a half of their current throughput and equal to only a quarter of their Sandy Bridge competitor (Intel had announced much in advance, by more than a year before launch, that Sandy Bridge will double the integer multiplication throughput over Nehalem, and it was anyway an obvious trend of the evolution of their previous cores; so the higher performance of the competition could not have been a surprise for the AMD designers).

The downgraded integer multipliers have crippled the performance of the new AMD CPUs for certain applications where their previous CPUs had been the best, while enabling only a negligible reduction in the core area.

price-to-performance is the last resort of a company that has failed at taking the performance crown.

Nobody cuts prices more than they have to, but everyone adjusts prices to where they need to go to sell the product. Bulldozer was priced low because it was genuine garbage, it was actually slower than Phenom in a lot of cases (which blows the "it was about price to peformance!" thing out of the water - nobody regresses performance on purpose).

(and before people wind up about the obvious counterexample: Ryzen was priced low because a 1800X was genuinely a lot slower than a 5960X in productivity tasks due to latency and poor AVX performance, and got completely smoked in gaming. If they had tried to go head-to-head with Intel at $1000 pricing they wouldn't have sold anything because it would have been a far inferior package to what Intel offered, they had to cut prices by around half to make it a compelling offering. And even then it was not that appealing compared to, say, a 5820K.)

Companies need to make enough of a showing to attract consumers but if a company prices something super aggressively, there's often a catch. And that's bulldozer in a nutshell. Oh shit the product sucks. What can we charge for a mediocre "8-core" (sorta) that underperforms the 4-core i7? Offer it at i5 pricing and see if anyone bites. If they had managed to achieve good performance, they would have priced it appropriately.

(the other thing is - people prefer to make the comparison about the FX-8350, but that's not Bulldozer, that's Steamroller. Bulldozer was the FX-8150/FX-6350, which actually did outright regress performance vs a Phenom X6, and was priced relatively steeply due to "8 real cores". Bulldozer went up against Sandy Bridge, Steamroller was more of an Ivy Bridge/Haswell competitor, and that's where prices really started to drop. It isn't a huge difference but Intel was making some progress too in those days.)

Price chart: https://www.anandtech.com/show/4955/the-bulldozer-review-amd...

But as a consumer, all I really care about is price/perf (and maybe power and a few other variables). Far to much of the tech industry runs around talking about how great the top dog (this week) is because they bin, push the engineering margins and sell some golden chip that ends up being .0000001% of their product line for some crazy $$$$$.

During the early part of the bulldozer timeframe AMD could provide a competitive part of much of intel's lineup at a lesser cost. It was only at the end were they kept falling farther and farther behind that it was a problem. For a few years there, you could actually _SEE_ in intels pricing where AMD's top part was because there would be a bunch of parts all clustered below some number (say $200) and then there would be a big price jump between every part above that line.

And so AMD had a real problem when you went into the $RETAILER looking at a $600 laptop because while their laptop might have been better than the similarly priced intel, what you would hear is "amd sucks" and so people would actually pick the inferior product.

sure, buy what you want, and competition certainly brings down prices, I don't disagree.

But making a low-cost product was not what AMD set out to do at the outset, so that's not really a defense of the technical flaws in Bulldozer's design. Sure, when they realized it was a trainwreck, they cut prices. Everyone does that, though, and that wasn't plan A.

Nobody is going to go through the expense of R&D and design and tapeout and then just not sell the product because it sucks/"missed expectations". You adjust the price to wherever it needs to be to sell the product.

Even in laptop the bulldozer chips were way power-hungry (actually this matters a lot more than in desktop) and just not that good a performer.

It was Intel's CEO's job to smile and sell hyper-clocked 14nm chips going against TSMC 7nm and it was AMD's CEO's job to smile and sell bulldozers going up against sandy bridge. That's what officers of the company do, even when they know it's shit. You go to war with the army you have, not the one you want, and you go to market with the product you have, not the one you want.

Yeah, it's good, but the author forgets to mention some other bad chips from before the late 1990's

- The Intel i432 - too far ahead of its time, in Itanium for the 1980's. https://en.wikipedia.org/wiki/Intel_iAPX_432

- The TI CMS320 series of DSPs. So full of silicon bugs it hurt TI badly.

- The Transputer T9000 - very ambitious, but vapourware for so long it killed its parent company. https://en.wikipedia.org/wiki/Transputer#T9000

the Cell processor in the PS3 was not terrible in the PS3 and I doubt you ever worked on it. So talk about 'not the best-researched'. You can find many people singing it's praises, including me.
Haha! I've spent months tuning code to run on the Cell, and I despise that thing.

Sony gave you 6 of the 8 SPE cores to use (I think they reserved two, but it's been ages). They are indeed very fast, however, they have no cache coherent access to main RAM and only 256k of memory for each element. So, you have to meticulously write DMA scheduling code to keep them fed. If you're a simpleton like me, you double buffer your SPE memory, cutting in in half, so 128k to work with, 128k for paging into, and you hope to be done paging before it's needed. Latency to memory is on the order of 2,000 cycles to first byte, but then they arrive fast.

So, what you do is decompose your problem into data streams that can be cruched through, but in such a way that you minimize the need to randomly access much memory. It's often cheaper to recompute things locally than to fetch them from RAM. Random access into your RAM is pointless, so you have to marshal all your input into DMA buffers, do some work, marshal all your output into other DMA buffers, and send back to host CPU.

Anyhow, I got this working. Meshes were being skinned at very high rate, but it was very frustrating. The PPE was really slow, so you had to offload as much as you could to those SPE's. But hey, I may be complaining, but it sure beats dealing with the "Emotion Engine" on the PS2. I can tell you which emotion that engine brings up.

In the early years, the SPUs were not all functional due to the fabbing process. The ones that had all 8 functional ended up in servers, and the ones with about 6 ended up in PS3s. This still happens all the time with clock speeds, and core counts on modern processors today. I’m sure the fabrication process improved over time, but they disabled the 2 cores to maintain backwards compatibility.

Unrelated: Every time I’m reminded about Cell I’m reminded of the OtherOS fiasco. I purchased a PS3 for the processor solely and I was very upset when I only got a $2 check for it. I never cashed it.

Same for me. I was very angry that Sony got away with that.
I still have a launch-edition PS3 on firmware 1.01 that I got on launch day (wife and I were fortunate enough to be able to buy two and stash one). I've lost all kinds of stuff in moves and etc. since, but that thing will have to be pried from my cold dead hands.
Sounds like you learned a lot and are a better programmer from it. Pretty much everything hasn't changed and you can either hope Moore's law bails you out somehow or you can take what you have learned and apply that to the reality of whatever hardware you happen to be optimizing, CPU or GPU. Sorry it's painful but to eke out max performance it's going to be hard at times.
For every person singing it's praises, there are dozens of game developers who were singing with gladness when it was gone. The PS3 devs I've spoken with (you aside) universally hated the platform and spoke of how much more dev time it took to launch games on the platform to achieve mediocre results.

If the chip were so wonderful to work on, then it would still be in use today as the theoretical performance per area beats everything else by a wide margin.

Roadrunner was built in 2008. It would still be just barely off the top 500 list in 2021, but was decommissioned just FIVE years later in 2013. Its x86 replacement was already underway in 2010 TWO years after its launch.

I'm glad you got to work with the architecture you loved for so many years, but I think the rest of the world disagrees with your assessment.

It probably was spectacular once you knew how to work with it. Like the Atari Jaguar though, getting the performance needed out of such a highly parallel architecture took a lot of time and investment. With cross-platform games really taking off during that time, it was a strategic mistake IMO.
That's an enthralling tale, but perhaps you could share why you feel it deserved praise-singing to begin with, and also what titles you worked on, considering many developers were complaining about it when it was current console architecture, and you don't even need to do much of a Google Search to find people bitching about it.
>Google Search to find people bitching about it.

Seriously? You can find people bitching about anything on Google Search. The fact is most people just weren't prepared for multi-core data oriented programming in 2006.

List my titles: no you first

> You can find people bitching about anything on Google Search

And in this thread, you can find credible people with specific complaints about the Cell processor.

> List my titles: no you first

That’s unfortunate. If you’re not full of shit, how could anyone possibly know?

> List my titles: no you first

One of the cardinal rules of argumentation is that the burden of proof is upon the person making the claim.

You've made it. Now back it up.

I guess you must have been on a college debate team.
> You can find many people singing it's praises, including me.

Until today, I’ve never once seen someone “singing it’s praises” that’s actually written code for one. At best, they’d curse it under their breath while saying it had its benefits. Usually however it was a full throated rant about how bad the experience was.

It was surprisingly useful for some high performance computing niches. It was in a weird time. FPGAs were available but weren’t as performant as they are today. GPUs were around but not nearly as powerful or flexible given some workloads.
Every single one of them came out a better coder. They might have been dragged kicking and screaming to the multi-core but they would've had to get there in the end.
> Every single one of them came out a better coder.

Sure that may be true, but that does not mean they are singing its praises either.

Just look at this very post on HN where folks who’ve written code for it have commented on the experience, how many would you say are:

- singing its praises (you)

- cursing it under their breath while saying it had its benefits (few)

- full throated rant about how bad the experience was (few)

IDK - I guess most experienced low level coders hate computers, it doesn't matter what the CPU. People are lazy. I understand that it was hard but it doesn't make it a WORST CPU EVER MADE
Mod -1: Rude
Personally I can find something to like in most architectures.

Cell (for example) was an asymmetric/hybrid multicore CPU; Apple Silicon is perhaps a modern example of asymmetric performance vs. efficiency cores, and also features special-purpose accelerator cores such as the neural engine.

The 432 had capability-based addressing. Speed-over-security has had a good run, but with some disastrous consequences. We may be seeing the return of capabilities with CHERI/ARM.

The 960 was an early superscalar design, supported tag bits, and was also a successful product.

Obligatory mention:

"RISC instruction sets I have known and disliked."

https://www.jwhitham.org//2016/02/risc-instruction-sets-i-ha...

https://news.ycombinator.com/item?id=11607119

I might also say that Sun's UltraSPARC was constantly beaten by Fujitsu SuperSPARC. It would have been better to outsource.

SuperSPARC was an earlier TI manufactured part. Fujitsu was (and I think still is) SPARC64, which was a nice series of parts, originally designed by HAL. I used to own a Fujitsu server - fast and built like a brick outhouse.
I remember evaluating the 960 for an embedded router project and it was quite a nice ISA. Plus the 66 Mhz CA part was fast for the price at the time.
The i960CA was the one of the first superscalar microprocessors. (I wrote a third-party commercial instruction scheduler for it, that operated on assembly code.) It was pretty nice, certainly in line with the other 32-bit RISCy ISAs of the time. My impression is that its relative lack of success was due to Intel internal politics.
Yes within Intel it was thought that management would not push the 960 since if did so it would be picked up by the press as validated RISC is better. But for embedded applications it was very successful, I was shipping hundreds of thousands of them per month at one point
i860 did well in embedded applications and for awhile was the mainstay in most RAID controllers and network communication processors. Not what Intel wanted from it but it did have a long life in such applications. I spent many years working on the i860 and i960 and learned to live with its oddities.

As for the Cell it was overly complex architecture and had remarkable performance under very optimized code. The hope was hand tuned libraries would address this; and compiler optimizations would take care of the rest. Neither happened in a meaningful way. We did two major projects with the Cell using it for real-time HDTV compression/direct broadcast applications.

Another one not on the list was the inmos Transputer. Again similar to the Cell; very complex and fast for its time; but not easy to achieve this performance. That was my first job as an EE - we used it on a GPS receiver ISA card in the early days of GPS. It was a good choice as very fast and could keep up with the signal processing that allowed us to roll code updates to add major features as various changes to GPS signals were rolled out (P-code on L2, SA being turned off, and later CA code on L2 being unencrypted). Our competitors had to redesign ASICS to get these new features which means long product cycles and hardware replacement.

Today I find myself doing a lot on the M1 series, as well as Epyc. Now you can give zero shits about clean optimized code and it still runs amazingly fast. Last time I had to do assembler or intrinsics was many many years ago - and I sort of miss that intimacy with the hardware to get the most out of it.

I think you mean 960 in RAID and comm controllers. The 860 had incredibly bad, almost unbelievably slow context switches. You’d never ever use it in a controller. A dedicated render pipeline is pretty all it was good for, for some value of ‘good’.
I had the same reaction. The i860 and i960 were very different beasts. I owned an 860-based Oki/Stardent workstation, bought for peanuts at the latter company's fire sale, for a while. Later I found the 960CA (in particular) in many storage/network devices. So I kind of know both, but I would never speak of them as if they were the same. Other than sharing a corporate logo, they had little to do with one another.
At least the 960 was somewhat usable. Many variants were created, and several were widely used in embedded products for quite a few years. The 860, however, was Just Crap. Full stop. End of story. IIRC it had weird double-instruction modes that compilers just couldn't handle, and if you used them anyway (for very necessary performance) then handling exceptions properly was all but impossible. Definitely gets my vote for worst ever.
I worked on an unreleased third-party C compiler for the i860. It wasn't that compilers couldn't handle the double-issue float mode, it was more that it was worthless in real-world code due to the entry/exit latency. It had high performance on paper but not in reality, which was exactly the lesson that Intel did not learn for the Itanium.
Interesting that Intel has such an impressive record of failed designs. Itanium, 860, and iAPX 432 - all anti-classics of their time.
I remember articles from Byte hyping it(the 860), also adverts for accelerator cards.

It runs rings around workstations!

"We now know that core 2 dropped all kinds of safety features resulting in the Meltdown vulnerabilities."

Curiously, every other out-of-order chip designer except for AMD also designed CPUs with Meltdown flaws. That's per their own documentation ARM, IBM both Power and mainframe, SPARC, and I think MIPS but they weren't entirely clear about it.

Yes, and no mention of the Transmeta Crusoe either.
It seems like Intel was in some ways like Microsoft. Their revenues were so high that they could survive spectacular failures and still keep going.
> The i960 was take 2 and their joint venture called BiiN also shuttered.

I have an old X-11 terminal I believe has a i960 in it. I’m shocked that thing was capable of running CDE desktops when it stutters on FVWM over a network much faster than it ever was intended to see.

What games were able to make full use of the Cell?