Hacker News new | ask | show | jobs
by snovv_crash 3597 days ago
The issue is the long lag time between new ideas being implemented at a design level, and the many iterations of fabbing and tweaking that need to take place before it can actually be sold. The majority of people in tech are too used to something being written in the morning and deployed in the afternoon to understand what it is like having a 3-month lag in your testing cycle, and minimum twice that till release.

Just like Intel had the P4 hole that it had to drag its way out of, so now AMD has had Bulldozer. Notice how Intel has been quite conservative with each individual tick/tock, trying to keep their pipeline full. Doing crazy changes risks causing a pipeline stall which could last years. Each new architecture is risky, and AMD screwed up with Bulldozer. From early signs it looks like Zen is a winner, hopefully AMD can stick with it for a while.

2 comments

I think bulldozer was just ahead of its time. I am fortunate to have one, and it has aged a lot better than the same-price alternatives from Intel due to the more multithreaded nature of e.g. games nowadays. That's not much comfort for AMD, since being ahead of your time is just another way to fail, but at least AMD had vision.

Mankind Divided's recommended specs are FX-8350 or i7 3770. The price difference between the two in their heyday was $100 in AMD's favor.

price was always in AMD's favor. problem is more and more the energy consumption. also the i-series of intel was really really solid and good, at least until broadwell (didn't seen too much skylake yet). that especially lost amd a lot of ground in the server space. and a lot of "high end" gamer favored the Intel i7 series aswell, even when they weren't as cheap as amd.
The i7 was a lot better for gaming than the 8350 was at the time. Hell, an i3 was better at the time. The 8350 was priced accordingly. There was no price advantage. What I'm saying is that AMD's design has aged a lot better.
AMD was cheaper (always) not better. You can be cheaper and worse and people will still buy the worse since it's cheaper even if price / quality on intel would've been better.
You obviously haven't actually checked the prices.
How exactly?

I see the evidence of that happening with AMD GPUs versus Nvidia, but versus Intel processors?

Can you explain?

When Bulldozer was released, most software only used one or two threads. Which meant that the 8-thread FX was slower than the 2-thread i3 because the extra threads did nothing and it had lower single-thread performance.

Now that newer software is using more threads, the old FX gets a big performance boost while the old i3 is only the same speed it ever was.

How did Intel lose ground with Broadwell?
IIRC it took much longer to deliver than expected, didn't improve much upon Haswell, sold in a limited set of SKUs, then was succeeded by Skylake a few months later.
AMD started work on Zen back in 2012 after it became clear the Bulldozer was a complete failure. It's taken them 4-5 years to get it ready for release. They still had Steamroller and Piledriver in the pipeline, so they released those anyway, despite the fact that line of processors is now dead.

It took about the same amount of time for Intel to release the Core and Core 2 architectures after realising they had made a huge mistake with the Pentium 4.

I've heard some people theorising that Intel might have worked their current architecture into a corner and they might have problems innovating out of it. I guess we will see when information about Zen's performance shows up.

What is shown on the slides is not that innovating. If anything, they seem to have thrown the bulldozer innovations out of the window and moved the architecture closer to the sandybridge-era processors.
> If anything, they seem to have thrown the bulldozer innovations out of the window

That's basically the point of Zen, Bulldozer was an architectural dead-end that wasn't going anywhere.

Besides, it's not like Intel have massively innovated since Sandybridge. Ivy, Haswell, Broadwell and Skylake are little more than successive perfections of the Sandybridge architecture.

It's hard to tell from the slides, but it looks like Zen is a much wider architecture than Intel, with 10 execution ports (4 ALU, 2 AGU, 2 FP ADD, 2 FP MUL). Sandybridge had 6, Haswell and later have 8 execution ports. Bulldozer had 4 integer execution ports plus 2 float ports, which are shared between each pair of cores.

The most interesting thing about those slides is the layout of the blocks marked "Scheduler". Intel chips all have a single scheduler, Bulldozer had one float scheduler (shared) and one integer scheduler. But I'm counting 7 schedulers on the Zen slides, one float scheduler managing the 4 floating point execution ports and 6 integer schedulers, one for each execution port.

> It's hard to tell from the slides, but it looks like Zen is a much wider architecture than Intel, with 10 execution ports

The text mentions it has to fuse the four FP ports to do a single 256-bit AVX per cycle. This is significantly less wide than Intel architectures (half/quarter). We can interpret the width thus as 4+2+1 ports, which is in the Haswell ballpark.

What is maybe more telling here is the 16-byte load/stores, Haswell is doing 32-byte at the same rate. It points to Zen abandoning FP bandwidth in both client and server. Perhaps they want to rely on GPGPU with the on-chip GPU to do compute workloads?

> The most interesting thing about those slides is the layout of the blocks marked "Scheduler". Intel chips all have a single scheduler, Bulldozer had one float scheduler (shared) and one integer scheduler. But I'm counting 7 schedulers on the Zen slides, one float scheduler managing the 4 floating point execution ports and 6 integer schedulers, one for each execution port.

Depends what they mean with Scheduler. If it means reservation stations for micro-ops, then that's already the case in other micro-architectures. If Scheduler means assigning micro-ops per port, than there can logically only be a single one.

> The text mentions it has to fuse the four FP ports to do a single 256-bit AVX per cycle. This is significantly less wide than Intel architectures (half/quarter). We can interpret the width thus as 4+2+1 ports, which is in the Haswell ballpark.

4+2+2, no need to combine all 4 ports, just the two multiplies or the two adds.

The text is speculation of the journalist. There It's possible that each port is actually 256 bits wide and fusing them is only needed for the 512bit AVX instructions that Intel don't even support yet.

Even if AMD are splitting the 256 bit fpus in half, that is still a huge win over average code, because 128bit SSE instructions are much more common than AVX instructions, and AMD can execute upto four of them per cycle.

Even Intel disable the upper half of their FPU most of the time to save power, AVX instructions get split into two 128bit micro-ops unless until a threshold is encountered and the upper half powers up.

> If Scheduler means assigning micro-ops per port, than there can logically only be a single one.

I assume that means one Re-order buffer per port. Bulldozer already had two Re-order buffer, one for float instructions and one for interger instructions, which proves multiple ROBs for different ports are possible. You just need to track dependencies across ROBs.

I'm guessing that tracking deprbdiencies across 7 schedulers is not much harder than tracking deprbdiencies across 2.

With the current state of the tech press, that's probably a good idea as any difference from how Intel does things is spun as a negative. Take this paragraph for the Anandtech article, for example: "Unlike Bulldozer, where having a shared FP unit between two threads was an issue for floating point performance, Zen’s design is more akin to Intel’s in that each thread will appear as an independent core and there is not that resource limitation that BD had. With sufficient resources, SMT will allow the core instructions per clock to improve, however it will be interesting to see what workloads will benefit and which ones will not." Intel-style SMT actually has more contention for shared processor resources than Bulldozer did, not less, because far more is shared. Despite that, AMD's switch to it is being spun as a positive simply because it's closer to what Intel do.
I'd say that it's not just spin and trying to be closer to what Intel's architecture is.

The FP contention between the cores in a Bulldozer module makes all recent AMD chips perform objectively worse in most benchmarks than their peers from Intel.

Intel's architecture isn't a priori a goal to achieve. Intel's performance in real-world workloads is a good goal.

There are some heavily-threaded, integer-heavy workloads that Bulldozer and related parts are still incredibly competitive at, even compared to current-gen Intel parts. For the right workload, a Bulldozer-family processor can be a real screamer and they are priced incredibly aggressively. We should recognize, though, that the architecture is high performance only for these specific workloads.

Perhaps AMD should have pursued more innovative architectures. I am not saying that Intel's is perfect. But it is important to note that for current general purpose computing workloads, Intel's architecture is superior to Bulldozer.