|
Oh... I see the problem. It's probably the thread heuristic. When running gg and rg, make sure -T and -j, respectively, are set to the same number. Because I think gg always defaults to `4`. Where as ripgrep is probably defaulting to a higher number. On very small corpora, like curl, this can actually lead to overall slower times due to the overhead of starting the threads. This also explains why the times are faster on Linux. My Linux workstation has a lot more CPUs than my M2 mac mini. My mac mini has 8 logical CPUs while my Linux box has 24. ripgrep won't necessarily start one thread per core, but at 8 cores, it will indeed start one thread per core. Where as gg will start 4. You can see ripgrep's heuristic here: https://github.com/BurntSushi/ripgrep/blob/e0f1000df67f82ab0... I suppose thread count heuristics are fair game for benchmarks, but in order to measure those better, you need a bigger variety of corpus sizes. Even with the Linux kernel, the difference between 4 and 8 threads for `gg` is not that big: $ hyperfine "gg -T4 '[A-Z]+_NOBODY'" "gg -T8 '[A-Z]+_NOBODY'"
Benchmark 1: gg -T4 '[A-Z]+_NOBODY'
Time (mean ± σ): 364.3 ms ± 2.5 ms [User: 331.1 ms, System: 1108.6 ms]
Range (min … max): 360.8 ms … 369.1 ms 10 runs
Benchmark 2: gg -T8 '[A-Z]+_NOBODY'
Time (mean ± σ): 349.3 ms ± 3.1 ms [User: 454.2 ms, System: 2056.2 ms]
Range (min … max): 345.4 ms … 355.8 ms 10 runs
Summary
gg -T8 '[A-Z]+_NOBODY' ran
1.04 ± 0.01 times faster than gg -T4 '[A-Z]+_NOBODY'
But go to a bigger corpus and a difference becomes much more apparent: $ hyperfine "gg -T4 '[A-Z]+_NOBODY'" "gg -T8 '[A-Z]+_NOBODY'"
Benchmark 1: gg -T4 '[A-Z]+_NOBODY'
Time (mean ± σ): 16.777 s ± 0.351 s [User: 1.868 s, System: 12.301 s]
Range (min … max): 16.376 s … 17.396 s 10 runs
Benchmark 2: gg -T8 '[A-Z]+_NOBODY'
Time (mean ± σ): 10.273 s ± 0.628 s [User: 1.931 s, System: 12.215 s]
Range (min … max): 8.980 s … 11.066 s 10 runs
Summary
gg -T8 '[A-Z]+_NOBODY' ran
1.63 ± 0.11 times faster than gg -T4 '[A-Z]+_NOBODY'
This is on a checkout of the Chromium repository.The increased variety of benchmarks is important here because you might have a simpler heuristic for thread count that does result in overall marginally faster times in some cases, but this obscures what you're giving up: substantially slower times in other cases. Moreover, the cases where 4 versus 8 threads results in faster times for 4 threads tend to have very small absolute differences. i.e., Not hugely perceptible by humans. |
I did set gg to default to 4 threads, which seemed to be the optimal number on my machine for the typical repo sizes I navigate daily. Increasing the number of threads beyond that often results in unnecessary overhead for my personal use cases.
I appreciate you pointing out the heuristic used in the ripgrep project. From what I understand, it also uses a fixed, machine-dependent number of threads, predetermined regardless of the task at hand (except for single-file tasks).
This is something I was curious about while writing the code but couldn't fully answer due to my limited knowledge of the subject: could we potentially use a filesystem-specific heuristic to estimate the workload and dynamically adjust the number of threads accordingly?
What I mean is a method, perhaps within the ignore crate, to estimate the amount of data to process—such as the number of files, file sizes, or number of lines—based on easily and cheaply accessible filesystem metadata.