Hacker News new | ask | show | jobs
by esac 3587 days ago
I'm pretty sure the bottleneck is disk I/O and not the CPU
2 comments

When searching for a literal string, the bottleneck tends to be memory bandwidth. When doing regex searches, the bottleneck is usually CPU. If caches are cold, then disk I/O is the limiting factor. Even in that case, technologies like NCQ allow some degree of concurrency.

If you have ag[1], you can play around with the --workers option to see how various numbers of threads change performance. (The default is for ag to use #CPUs-1 workers.)

1. https://github.com/ggreer/the_silver_searcher

Even when disk caches are involved (ie. data in RAM, not cache), a typical grep application (short "needle" to search for) should saturate the memory bandwidth before CPU cores.

Running a few threads/processes in parallel could improve throughput with latency hiding, but adding more shouldn't give any benefit.