Hacker News new | ask | show | jobs
by markhahn 692 days ago
I'm not really sure why you say "one FPU per core". are you talking about the programmer-visible model? all current/mainstream processors have multiple and parallel FP FUs, with multiple FP/SIMD instructions in flight. afaik, the inflight instrs are from both threads if SMT is enabled (that is, tracked like all other uops). I'm also not sure why you say that enabling SMT "makes the queue longer" - do you mean pipeline? Or just that threads are likely to conflict?
1 comments

Yes, but well optimized math heavy software will already max out the super-scalarity of the FPU. I.e. one cpu thread can already schedule multiple fpu-heavy instructions at the same time. If you run such software twice on the same fpu you will only gain overhead. I guess by queue he meant the processor internal work queue, the processor pipeline is only half of the picture. Processors have a small data-dependency graph of micro-instructions they have to perform. That is used to implement the machine code instructions that are currently in-flight.
> I guess by queue he meant the processor internal work queue...

Yes, I meant the internal one. Also, when you enable SMT, a small tag is added in front of every instruction, noting which logical core owns this instruction for a given physical core. So instead of tagging every instruction with a core-ID, you add a longer tag in the form of core-ID/logical_core-ID.

This extra tagging also makes instructions bigger, so the queue can hold less instructions, adding fuel to already chaotic and choked FPU logistics.

As a result, if you're saturating your FPU(s), SMT can't save you. In fact can make you slower.