| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by phire 3596 days ago

> The text mentions it has to fuse the four FP ports to do a single 256-bit AVX per cycle. This is significantly less wide than Intel architectures (half/quarter). We can interpret the width thus as 4+2+1 ports, which is in the Haswell ballpark.

4+2+2, no need to combine all 4 ports, just the two multiplies or the two adds.

The text is speculation of the journalist. There It's possible that each port is actually 256 bits wide and fusing them is only needed for the 512bit AVX instructions that Intel don't even support yet.

Even if AMD are splitting the 256 bit fpus in half, that is still a huge win over average code, because 128bit SSE instructions are much more common than AVX instructions, and AMD can execute upto four of them per cycle.

Even Intel disable the upper half of their FPU most of the time to save power, AVX instructions get split into two 128bit micro-ops unless until a threshold is encountered and the upper half powers up.

> If Scheduler means assigning micro-ops per port, than there can logically only be a single one.

I assume that means one Re-order buffer per port. Bulldozer already had two Re-order buffer, one for float instructions and one for interger instructions, which proves multiple ROBs for different ports are possible. You just need to track dependencies across ROBs.

I'm guessing that tracking deprbdiencies across 7 schedulers is not much harder than tracking deprbdiencies across 2.