Hacker News new | ask | show | jobs
by burntsushi 2871 days ago
In my experience, it is difficult to beat a very simple heuristic in most settings: when provided a string, pick the most infrequently occurring byte in that string, and use that as your skip loop in a vector optimized routine such as memchr. If you guess the right byte, you'll spend most of your time in a highly optimized routine that makes the most out of your CPU.

Picking the right byte can be tricky, but a static common sense ranking actually works quite well. At least, the users of ripgrep seem to think so. :-)

For some reason, I've never seen this algorithm described in the literature. It doesn't have nice theoretical properties, but it's quite practical.