| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by slabity 1412 days ago
	SMT is usually disabled in these situations to prevent it from being a concern.

2 comments

nextaccountic 1412 days ago

Doesn't this leave some performance on the table? Each core has more ports than a single thread could reasonably use, exactly because two threads can run on a single core

link

slabity 1412 days ago

In terms of throughput, technically yes, you are leaving performance on the table. However, in HFT the throughput is greatly limited by IO anyways, so you don't get much benefit with it enabled.

What you want is to minimize latency, which means you don't want to be waiting for anything before you start processing whatever information you need. To do this, you need to ensure that the correct things are cached where they need to be, and SMT means that you have multiple threads fighting each other for that precious cache space.

In non-FPGA systems I've worked with, I've seen dozens of microseconds of latency added with SMT enabled vs disabled.

link

jnordwick 1412 days ago

Maybe 10 years ago that as the common things, but there are so many exta resources (esp registers) that is is now giving up almost half the chip. If you can be cache friendly enough, the extra cycles will make up for it.

link

slabity 1412 days ago

No, this is not true at all. "The extra cycles" is the exact thing you want to avoid in HFT. It doesn't matter how much throughput of processing you can put through a single core if you enable SMT, because somewhere in the path (either broker, exchange, or some switch in between) you will eventually be limited in throughput that it becomes irrelevant.

The only thing that matters at that point is latency, and unless you are cache-friendly enough to store your entire program in a single core's cache twice over, you would be better off disabling SMT altogether. And even if you were able to do that, it would not matter as a single thread would be done processing a message by the time the next one comes in. At least at the currently standard 10-25Gbps that the exchanges can handle.

In HFT, we're fine giving up half the registers in a core if it means we get an extra few microseconds of latency back.

link