| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nxobject 545 days ago
	Apropos of SIMD – I'm also surprised that the inner loop of the rejection-based algorithm optimized to MMX, but not the analytic algorithm! I would like to think the rejection algorithm after -O3 is benefiting from branch prediction and all sorts of modern speculation optimizations. But I imagine the real test of that would be running these benchmarks would be running these benchmarks on a 5-10ish year old uarch.

1 comments

Sesse__ 545 days ago

Ten years ago, you already have Skylake with pretty good indirect branch predictors...

link

Veliladon 542 days ago

It doesn't matter because you can't predict random data which is what the example is using. Branch predictors work on the premise that 95% of the time the branch resolves the same way which is damn common in most code. The closer your branches look to a normal distribution vs bimodal, the worse the branch predictor is going to perform.

The rejector is going to be faster because more iterations can be kept in the reorder buffer at once. The CPU is going to have to keep all the instructions post-branch in the reorder buffer until the branch is retired. If it's waiting on an instruction with an especially long latency like a square root it's probably speculatively already finished the work anyway, rejected or not.

If the branch rejects after the work is done the reorder buffer can retire each instruction of the random generation algorithm as it comes and then wait on however many branches at the end which can also be run in parallel because those branches aren't dependent on each other. All those branches which are doing a square root will also be pipelined properly instead of putting bubbles everywhere.

link

Sesse__ 540 days ago

My point was that there hasn't been a revolution in branch prediction (or really any sort of speculation) the last 5–10 years. Things have become incrementally better as always, and the M1 presumably has some sort of pointer value speculation, but by and large, things are generally similar. So running on a 5–10 year old microarchitecture won't change the qualitative results much.

If you talked about running on a 30-year-old architecture, then sure, that would tell you something about the effect of modern speculation on the code.

link