When searching for a literal string, the bottleneck tends to be memory bandwidth. When doing regex searches, the bottleneck is usually CPU. If caches are cold, then disk I/O is the limiting factor. Even in that case, technologies like NCQ allow some degree of concurrency.
If you have ag[1], you can play around with the --workers option to see how various numbers of threads change performance. (The default is for ag to use #CPUs-1 workers.)
Even when disk caches are involved (ie. data in RAM, not cache), a typical grep application (short "needle" to search for) should saturate the memory bandwidth before CPU cores.
Running a few threads/processes in parallel could improve throughput with latency hiding, but adding more shouldn't give any benefit.
If you have ag[1], you can play around with the --workers option to see how various numbers of threads change performance. (The default is for ag to use #CPUs-1 workers.)
1. https://github.com/ggreer/the_silver_searcher