Hacker News new | ask | show | jobs
by sild 2106 days ago
I put the benchmark into quick-bench but could not replicate the 40% result. The sentinel version was faster but only slightly.

https://quick-bench.com/q/314Z81FskTlcDqMCUHFVhWmDz8Q

Update 1: After moving some constants around, I get the 40% result:

https://quick-bench.com/q/lPrpQTAyDQuOoKS9MBWCTBXk1TE

No idea why it made such a big difference to the benchmark.

Update 2: If the test order is reversed, the result goes back to being only slightly faster for the sentinel version:

https://quick-bench.com/q/Ds7aqe5-6md_tTPndOK54ltYZmE

2 comments

In the first two links you sent, the 40% result looks like the baseline case getting slower, not the unit under test getting faster. The core assembly looks look identical in both cases.
Well spotted; great for sild! Looks like the 40% claim was due to a bug in benchmarking (which makes sense, and can happen).
The order in which the tests were run was the first thing I checked in his implementation, but I looked too quickly and thought he was generating the data for each variant, so I assumed that was not the problem. [Actually, you need the same data for both tests, but generated twice]

I was going to just point out that 40% percent difference would mean that the version without the sentinel can be improved... was going to check if there is something that is preventing the branch prediction from actually taking care of that performance drop - memory is only being read and nothing should be invalidated...

I am pretty sure that the 40% original difference was due to a bug in benchmarking - or food for thought to improve the non-sentinel version.