Hacker News new | ask | show | jobs
by burntsushi 652 days ago
It's not disk I/O because we're using hyperfine for measuring. It does warm-up runs first, and unless your machine has a teeny amount of RAM, everything is in cache. You can put your corpus on a ramdisk (usually `/tmp` is on Linux and I believe always `/dev/shm`, IDK about macOS) to verify this.

Since you're running on macOS, I'll do the same. I have an M2 mac mini. My previous benchmarks were on my Linux workstation. Your `curl` benchmark:

    $ hyperfine "rg '[A-Z]+_NOBODY' ." "gg '[A-Z]+_NOBODY'"
    Benchmark 1: rg '[A-Z]+_NOBODY' .
      Time (mean ± σ):      20.3 ms ±   0.7 ms    [User: 18.6 ms, System: 96.0 ms]
      Range (min … max):    18.4 ms …  21.3 ms    126 runs

    Benchmark 2: gg '[A-Z]+_NOBODY'
      Time (mean ± σ):      17.9 ms ±   0.7 ms    [User: 15.6 ms, System: 38.6 ms]
      Range (min … max):    17.0 ms …  19.9 ms    141 runs

    Summary
      gg '[A-Z]+_NOBODY' ran
        1.13 ± 0.06 times faster than rg '[A-Z]+_NOBODY' .
So slightly edged out by `gg` here, but not as big of a difference as you're seeing. What version of ripgrep are you using?

Also, as I said before, these times are pretty short. Try a bigger corpus. For example, in my clone of Linux (also on my M2 mac mini):

    $ git remote -v
    origin  git@github.com:BurntSushi/linux (fetch)
    origin  git@github.com:BurntSushi/linux (push)

    $ git rev-parse HEAD
    84e57d292203a45c96dbcb2e6be9dd80961d981a

    $ hyperfine "rg '[A-Z]+_NOBODY' ." "gg '[A-Z]+_NOBODY'"
    Benchmark 1: rg '[A-Z]+_NOBODY' .
      Time (mean ± σ):     343.3 ms ±   4.2 ms    [User: 359.3 ms, System: 2243.3 ms]
      Range (min … max):   339.0 ms … 352.7 ms    10 runs

    Benchmark 2: gg '[A-Z]+_NOBODY'
      Time (mean ± σ):     351.1 ms ±   4.6 ms    [User: 326.4 ms, System: 1059.1 ms]
      Range (min … max):   348.2 ms … 363.8 ms    10 runs

    Summary
      rg '[A-Z]+_NOBODY' . ran
        1.02 ± 0.02 times faster than gg '[A-Z]+_NOBODY'
It is very interesting that the differences are almost zero on macOS but quite a bit bigger on Linux. That might be worth investigating.

IMO, if you're advertising "circumstantially faster than ripgrep," then you should be able to characterize the circumstances in which that occurs.