|
|
|
|
|
by burntsushi
2871 days ago
|
|
In my experience, it is difficult to beat a very simple heuristic in most settings: when provided a string, pick the most infrequently occurring byte in that string, and use that as your skip loop in a vector optimized routine such as memchr. If you guess the right byte, you'll spend most of your time in a highly optimized routine that makes the most out of your CPU. Picking the right byte can be tricky, but a static common sense ranking actually works quite well. At least, the users of ripgrep seem to think so. :-) For some reason, I've never seen this algorithm described in the literature. It doesn't have nice theoretical properties, but it's quite practical. |
|