Hacker News new | ask | show | jobs
by wyager 3700 days ago
Worth noting that modern processors aren't just pipelined; each core is composed of multiple pipelined units operating in parallel. Using the Tomasulo algorithm and derivatives, modern processors can execute a ton of instructions before instructions that appear before them are able to complete. The biggest issue is a branch that depends on a slow operation, which forces you to throw out tons of work if you mispredict.
1 comments

Could you define slow operations? I am trying to better imagine a CPU having to throw out work where it is basically trying to get from a cache-miss (failure) to cache-miss (failure) as fast as it can.
I think what wyager meant is that on a mispredicted branch you have a CPU pipeline already filled with data and partial results (scheduled loads, r/w register conflicts, etc.) Now that we're at close to 30 stages per instruction, flushing all those partial results because your `if` went in a different branch than usual can get costly. In practice, after a branch misprediction you'll have to wait at least [pipeline_stages] cycles until you see the next instruction run. But if the scheduled instruction was something complex that comes from SSE / AES-NI / etc., you may have to wait even longer.