Hacker News new | ask | show | jobs
by gcr 635 days ago
For the purpose of searching, wouldn’t it be sufficient to do NFC normalization for text? Could hide that behind a command line flag even…
1 comments

Can you say how that differs from what I suggested in my last paragraph? I legitimately can't tell if you're trying to suggest something different or not.

As UTS#18 2.1 says, it isn't sufficient to just normalize the text you're searching. It also means the user has to craft their regex appropriately. If you normalize to NFC but your regex uses NFD, oops. So it's probably best to expose a flag that lets you pick the normalization form.

And yes, it would have to be behind a CLI flag. Always doing normalization would likely make ripgrep slower than a naive grep written in Python. Yes. That bad. Yes. Really. And it might now become clear why a lot of tools don't do this.