Hacker News new | ask | show | jobs
by alexpasmantier 652 days ago
Posting this here as well for reference https://news.ycombinator.com/item?id=41380671

@burntsushi

Hi! First of all, thank you for taking the time to write this. I've been using ripgrep for quite some time, and it's an amazing piece of software. Having your comment here is truly an honor.

> I'm not sure I totally get the motivation here to be honest

This is primarily a small project I started to familiarize myself with Rust. I thought that exploring the basics of ripgrep and attempting to build something similar would be a good way to get started.

> Also, the flags that it does support are overriding long-held custom that are likely to be confusing to users

Noted. I'll consider making these changes to avoid potentially confusing anyone.

> It's also pretty annoying to share screenshots of benchmarks instead of just showing a simple copyable command with a paste of the results.

I've updated the documentation with the actual commands and included a copy of the results.

> I also can't quite reproduce at least the curl benchmark

I just ran the curl benchmark again on the same machine (my work laptop, an M3 Apple MacBook), and here are the results:

  $ hyperfine "rg '[A-Z]+_NOBODY' ." "gg '[A-Z]+_NOBODY'" "ggrep -rE '[A-Z]+_NOBODY' ."

  Benchmark 1: rg '[A-Z]+_NOBODY' .
     Time (mean ± σ):      38.5 ms ±   2.2 ms    [User: 18.1 ms, System: 207.3 ms]
     Range (min … max):    33.8 ms …  42.8 ms    72 runs
  
  Benchmark 2: gg '[A-Z]+_NOBODY'
     Time (mean ± σ):      21.8 ms ±   0.8 ms    [User: 15.4 ms, System: 53.1 ms]
     Range (min … max):    20.2 ms …  23.8 ms    115 runs
  
  Benchmark 3: ggrep -rE '[A-Z]+_NOBODY' .
     Time (mean ± σ):      73.3 ms ±   0.9 ms    [User: 26.5 ms, System: 45.7 ms]
     Range (min … max):    70.8 ms …  75.6 ms    41 runs
  
  Summary
     gg '[A-Z]+_NOBODY' ran
       1.77 ± 0.12 times faster than rg '[A-Z]+_NOBODY' .
       3.36 ± 0.13 times faster than ggrep -rE '[A-Z]+_NOBODY' .
> It looks like it's assuming that the `ArrayQueue` it uses is never full?

I used a default maximum size for the queue (configurable via the --max-results argument) to pre-allocate it, as I thought this might improve performance. However, I'm currently not handling errors properly and just allowing the program to panic when the number of results exceeds the set limit.

> So why doesn't it have the same performance profile as ripgrep?

Given the differences in execution times between our benchmarks, I suspect that because ripgrep's (and, by extension, gg's) performance bottleneck is primarily disk I/O, variations in filesystems and underlying storage hardware could explain the significantly different results we're observing. What do you think?