|
|
|
|
|
by burntsushi
1113 days ago
|
|
For GNU grep in particular, no, using a UTF-8 locale can significantly slow it down: $ time LC_ALL=C grep -E '^\w{30}$' OpenSubtitles2018.raw.sample.en -c
3
real 0.808
user 0.744
sys 0.063
maxmem 10 MB
faults 0
$ time LC_ALL=en_US.UTF-8 grep -E '^\w{30}$' OpenSubtitles2018.raw.sample.en -c
4
real 20.064
user 19.982
sys 0.077
maxmem 10 MB
faults 0
Where as ripgrep is just Unicode aware by default, and still about as fast as the ASCII only variant of GNU grep above: $ time rg '^\w{30}$' OpenSubtitles2018.raw.sample.en -c
4
real 1.163
user 1.132
sys 0.030
maxmem 916 MB
faults 0
|
|