Hacker News new | ask | show | jobs
by yuliyp 236 days ago
That's not the problem. The problem is that you're adding a data dependency on the CPU loading the first byte. The branch-based one just "predicts" the number of bytes in the codepoint and can keep executing code past that. In data that's ASCII, relying on the branch predictor to just guess "0" repeatedly turns out to be much faster as you can effectively be processing multiple characters simultaneously.
1 comments

I am pretty sure CPUs can speculative load as well. In the CPU pipeline, it sees that there's an repeated instruction to load, it should be able to dispatch and perform all of it in the pipeline. The nice thing is that there is no chance of hazard execution here because all of this speculative load is usable unlike the 1% chance where the branch would fail which causes the whole pipeline to be flushed.