Worth noting that modern processors aren't just pipelined; each core is composed of multiple pipelined units operating in parallel. Using the Tomasulo algorithm and derivatives, modern processors can execute a ton of instructions before instructions that appear before them are able to complete. The biggest issue is a branch that depends on a slow operation, which forces you to throw out tons of work if you mispredict.
Could you define slow operations? I am trying to better imagine a CPU having to throw out work where it is basically trying to get from a cache-miss (failure) to cache-miss (failure) as fast as it can.
I think what wyager meant is that on a mispredicted branch you have a CPU pipeline already filled with data and partial results (scheduled loads, r/w register conflicts, etc.) Now that we're at close to 30 stages per instruction, flushing all those partial results because your `if` went in a different branch than usual can get costly. In practice, after a branch misprediction you'll have to wait at least [pipeline_stages] cycles until you see the next instruction run. But if the scheduled instruction was something complex that comes from SSE / AES-NI / etc., you may have to wait even longer.
The real thing we care about is "wasted processor cycles", as well as their source. We measure some sources (branch mispredictions, instruction cache thrashing etc.) and some potential sources (e.g. cache misses). What we lack is a metric how bad each instance is. Not every branch misprediction has the same cost. With cache misses the cost can be nearly zero or very high. It would be nice to be able to measure (or simulate or estimate) the real magnitude of each problem.
As long as we don't have that, cache misses are a useful metric on their own, as long as one is aware of its caveats. As most things, cache misses come in various shades of grey.
This discussion reminded me of a formula for calculating the average cost of a cache miss in this pretty cool paper ("An Analysis of the Effects of Miss Clustering
on the Cost of a Cache Miss").