|
|
|
|
|
by yuliyp
236 days ago
|
|
That's not the problem. The problem is that you're adding a data dependency on the CPU loading the first byte. The branch-based one just "predicts" the number of bytes in the codepoint and can keep executing code past that. In data that's ASCII, relying on the branch predictor to just guess "0" repeatedly turns out to be much faster as you can effectively be processing multiple characters simultaneously. |
|