Hacker News new | ask | show | jobs
by corresation 4697 days ago
Try with GOMAXPROCS set to the number of real cores on your test machine (e.g. ignore hyperthread cores).
2 comments

OK I did GOMAXPROCS=4. 3.6 seconds Node cluster count 4: 17.5 seconds
What do you think the difference is between a "real core" and a "hyperthread core"?
An HTT cpu can still only execute one instruction at a time; there are often instructions that cause the cpu to have idle time (stalled waiting for data) and hyperthreading allows for the cpu to spend that otherwise idle thread making progress on a separate task list. However, this still means that the two scheduled threads are contending for the same execution unit... The parent is suggesting that this contest may cause more of a performance degradation than the advantages that HTT provides, which would be easily resolved with some benchmarking :D
note: the above was a simplification / based on my understand of HTT cpus as of about 2008. apparently things got more complicated in the last 5 years :D The bottom line remains that HTT can cause slowdown in some cases and you should benchmark with it turned off as well.
No, not even close. Each thread on a Haswell CPU, just as an example, has 8 execution ports. Each Haswell core has ten execution units. The CPU can retire way more than one instruction per cycle.
You are absolutely correct, but you could also afford to be a bit more polite. Sentences like "In other words, you have no idea what the difference is" might be true but they're also a bit rude.
They aren't absolutely correct at all, and aaron was actually close to the money. Thrownaway is fundamentally misrepresenting (or misunderstanding) how threads -- in an operating system sense, and what we are talking to here -- relate to microcode and execution units in a core.
In theory
It's not just theory, you typically get about 3 instructions per cycle in practice.
There is a huge difference between the two. Hyperthreaded cores only give you a speed up in specific situations where additional work can be squeezed into the pipeline.

http://en.wikipedia.org/wiki/Hyper-threading

The speedup is very work dependent and in practice for things like web pages and api servers you generally only get another 20-40% of performance from them rather than a full 100%.

In other words, you have no idea what the difference is.

A hyperthreaded Intel CPU has M functional units and N decode/issue pipelines.

A non-hyperthreaded Intel CPU has M' functional units and N' decode/issue pipelines.

A hyperthreaded Intel CPU with hyperthreading disabled has M functional units and N/2 decode/issue pipelines.

I'd be humored to hear your idea of what the difference is, given the misplaced use of scare-quotes.

A hyperthread core is a virtual core -- it is not actually a core at all but is a re-purposed, possibly stalled physical core. While it can improve some scenarios, in some cases (particularly core-saturating benchmarks) it can actually hurt performance.

This is hardly an out there or controversial statement. Further I didn't say to disable hyperthreading, I said to try setting parallelism to the physical cores. Again, nothing, whatsoever, controversial about that.

It is an "out there" statement because it's entirely, radically incorrect. A processor thread represents a full-blown decode and issue pipeline. A core represents a set of execution resources. Each pipeline can dispatch to any execution unit equally. In case of contention for the same execution unit, one thread issues immediately and the other thread issues next.

If you don't disable hyperthreading, but instead run four threads on an 8-thread CPU, it is extremely likely that the threads will be scheduled on the first two cores/four threads and the other two cores will be shut down, especially on the newer intel CPUs with "turbo" features where this strategy can have large benefits.

The operating system schedules threads across cores, and the processor has zero say in the matter (further, the execution units are primarily to facilitate branch to essentially execute future scenarios). Both Linux and Windows are hyperthread aware, and will schedule threads to physical processors first, then to hyperthread virtual processors (given that it shares resources with the physical core and can sabotage performance).

This is common knowledge, and your laughable obnoxiousness, which anyone who has ever worked with multithreaded code on a HT processor knows is farce, rings pretty ridiculous.

No, the power-aware scheduler in Linux does not work as you describe. On a turbo-capable Intel CPU, if there are N program threads that will fit on M cores where M is less than the total cores on a socket, and the CPU will enter P0 state, then the threads will run on as few cores as possible and the remaining cores will be shut down.
Two threads, each running at 100%, will be assigned to two physical cores. This is reality, and is obvious given that assigning it to a physical core and a hyperthread core at most will give you about 130% instead of the 200% two physical cores will provide.

Unless, of course, you've set a power profile to prioritize power efficiency, but that would be an absolutely ridiculous assumption given that we're talking about benchmarks.

Well it shouldn't matter that much anyway, I don't think modern kernels will put the same process threads on the same physical core.

That's the reason intel tells you to shut hyperthreading off if your operating system doesn't support it.