| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by burntsushi 670 days ago

> It has many, many more features, is bigger, newer, shinier and (yes) a bit slower.

Can you show a common case where ripgrep is meaningfully slower than grep?

Here are a few common cases where ripgrep is meaningfully faster than GNU grep 3.11 on Linux.

The setup:

    $ cd /dev/shm
    $ curl -LO 'https://burntsushi.net/stuff/OpenSubtitles2018.raw.sample.en.gz'
    $ gunzip OpenSubtitles2018.raw.sample.en.gz

A simple case:

    $ time rg -c 'Sherlock Holmes' OpenSubtitles2018.raw.sample.en
    502

    real    0.102
    user    0.056
    sys     0.046
    maxmem  903 MB
    faults  0

    $ time LC_ALL=C grep -c 'Sherlock Holmes' OpenSubtitles2018.raw.sample.en
    502

    real    0.378
    user    0.255
    sys     0.122
    maxmem  21 MB
    faults  0

A case with multiple substrings:

    $ time rg -c -e 'Sherlock Holmes' -e 'John Watson' -e 'Irene Adler' -e 'Professor Moriarty' OpenSubtitles2018.raw.sample.en
    628

    real    0.128
    user    0.077
    sys     0.050
    maxmem  903 MB
    faults  0

    $ time LC_ALL=C grep -c -e 'Sherlock Holmes' -e 'John Watson' -e 'Irene Adler' -e 'Professor Moriarty' OpenSubtitles2018.raw.sample.en
    628

    real    0.580
    user    0.516
    sys     0.063
    maxmem  21 MB
    faults  0

You can run the `rg` commands with `--no-mmap` to get slightly slower but roughly similar times but using standard `read` syscalls instead of file-backed memory maps.

I chose these cases because they represent common and simple queries. And, specifically, it's searching a single text file. There isn't anything involving parallelism or skipping files or recursive search or whatever.