Hacker News new | ask | show | jobs
by pjscott 4948 days ago
How does fast character counting help with full-text search?
1 comments

The best search algorithms can skip ahead upon a mismatch. A variable-length encoding requires branch instructions in the inner loop, leading to pipeline flushes and potentially dramatic slow down.
This is incorrect. Searching text with a variable-length encoding does not require extra branch instructions. If you're searching through UTF-8 text, you can just pretend it's a bunch of bytes and search through that.

This isn't counting problems with normalization, of course. You will have to put your needle and haystack both into the same normalization form before searching. But you had to do that anyway.