Hacker News new | ask | show | jobs
by solarexplorer 4629 days ago
This is actually what hyperthreading is all about: cache misses. I missed that in the article. There are more things missing actually, but I guess it would be too much to explain it all in a single article. Things like caches, coherence protocols, prefetching, memory disambiguation. Registers are also much more complex because you have things like register renaming, result forwarding etc. In the end there are simply much less registers than memory locations, that's why you can build faster registers than memory.
1 comments

I thought hyperthreading was able to go beyond this, and e.g. execute the two streams in parallel if one is hitting the FPU and the other is doing integer work, even if neither one is stalled.

And you're right, it's missing a lot because I'm writing an article, not a book. It is fun to explore details, but ultimately you have to stop somewhere.

That was the impression I had too, but if so I can see how "this is actually what hyperthreading is all about" would make sense. Two streams of code are unlikely to have long segments of just-FPU and just-integer respectively, and even more unlikely that those streams will happen to align during execution. It happens, sure, but the gains would be smallish.

On the other hand, long periods of no cache misses followed by long periods of waiting after a cache miss are exactly what you expect from real code (especially optimized code). So I'd think that you'd have much bigger gains from that. The same goes for branch misprediction.

Well, the gains are smallish. Real-world gains from hyperthreading are on the order of 10-20% when you load up a CPU with two threads.
Yeah, but when I said "smallish" I was thinking more on the order of 1%. I would consider 10% actual gains to be quite large given the craziness of what Hyperthreading tries to accomplish.
It may also be a matter of more fully utilizing multiple integer/floating-point units. Say, if the CPU has two integer units but the current code is only using up one of them, then it could run the second hyperthread on the other. I really don't know the details though.
Yes, hyperthreading (aka SMT), as implemented in Intels processors, can execute instructions from several threads in the same clock cycle. Other processors, like Sun's Niagara, switch threads only on certain events like cache misses (this is known as SoEMT). Workloads with a lot of cache misses is where both really shine.

Of course it's hard to write about a complex topic, choose the right details, and make it all seem simple. Thumbs up for trying!