|
|
|
|
|
by novas0x2a
5552 days ago
|
|
One thing to check: what is your current locale? GNU grep is much, much slower when it's multibyte-aware, even if you're searching for an ascii string. Try repeating your test with LC_ALL=C grep (don't forget to take the filecache into account). You can check the current values of your locale with `locale`. |
|
I'm still surprised why it's so much slower with UTF-8, though. I guess gnu grep is naively converting back and forth between representations? There's nothing in UTF-8 that should prevent it from doing this efficiently. Even with complex patterns, it possible to search through the file in about the same time as a simple pattern. E.g. aspell can build a FSA where each transition is O(1), making the search time more or less independent of the search pattern.