| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by onetimePete 3697 days ago
	There is only so much work to go around, before cache misses turn into waiting games, even with branch prediction and micro-code piping. What is your suggested alternative metric?

2 comments

wyager 3697 days ago

Worth noting that modern processors aren't just pipelined; each core is composed of multiple pipelined units operating in parallel. Using the Tomasulo algorithm and derivatives, modern processors can execute a ton of instructions before instructions that appear before them are able to complete. The biggest issue is a branch that depends on a slow operation, which forces you to throw out tons of work if you mispredict.

link

audi100quattro 3697 days ago

Could you define slow operations? I am trying to better imagine a CPU having to throw out work where it is basically trying to get from a cache-miss (failure) to cache-miss (failure) as fast as it can.

link

viraptor 3697 days ago

I think what wyager meant is that on a mispredicted branch you have a CPU pipeline already filled with data and partial results (scheduled loads, r/w register conflicts, etc.) Now that we're at close to 30 stages per instruction, flushing all those partial results because your `if` went in a different branch than usual can get costly. In practice, after a branch misprediction you'll have to wait at least [pipeline_stages] cycles until you see the next instruction run. But if the scheduled instruction was something complex that comes from SSE / AES-NI / etc., you may have to wait even longer.

link

halomru 3697 days ago

The real thing we care about is "wasted processor cycles", as well as their source. We measure some sources (branch mispredictions, instruction cache thrashing etc.) and some potential sources (e.g. cache misses). What we lack is a metric how bad each instance is. Not every branch misprediction has the same cost. With cache misses the cost can be nearly zero or very high. It would be nice to be able to measure (or simulate or estimate) the real magnitude of each problem.

As long as we don't have that, cache misses are a useful metric on their own, as long as one is aware of its caveats. As most things, cache misses come in various shades of grey.

link

onetimePete 3697 days ago

So you suggest cache misses corrected by a instruction-workload bias (which should be again biased by how "hot" the instructions remain)?

EvilOfCacheMiss = TimeOfMemoryFetchCycles - CyclesSpendDoingInstructions /TimeOfMemoryFetchCycles

link

azernik 3697 days ago

More like TimeOfMemoryFetchInCycle - CyclesSpentDoingInstructions

But yeah, sounds like a useful metric to me.

link

matt_d 3696 days ago

This discussion reminded me of a formula for calculating the average cost of a cache miss in this pretty cool paper ("An Analysis of the Effects of Miss Clustering on the Cost of a Cache Miss").

In particular, see Equation 4: http://researcher.ibm.com/files/us-viji/miss-cluster.pdf

link