Hacker News new | ask | show | jobs
by funcDropShadow 690 days ago
Yes, but well optimized math heavy software will already max out the super-scalarity of the FPU. I.e. one cpu thread can already schedule multiple fpu-heavy instructions at the same time. If you run such software twice on the same fpu you will only gain overhead. I guess by queue he meant the processor internal work queue, the processor pipeline is only half of the picture. Processors have a small data-dependency graph of micro-instructions they have to perform. That is used to implement the machine code instructions that are currently in-flight.
1 comments

> I guess by queue he meant the processor internal work queue...

Yes, I meant the internal one. Also, when you enable SMT, a small tag is added in front of every instruction, noting which logical core owns this instruction for a given physical core. So instead of tagging every instruction with a core-ID, you add a longer tag in the form of core-ID/logical_core-ID.

This extra tagging also makes instructions bigger, so the queue can hold less instructions, adding fuel to already chaotic and choked FPU logistics.

As a result, if you're saturating your FPU(s), SMT can't save you. In fact can make you slower.