|
|
|
|
|
by kbenson
4433 days ago
|
|
Possibly the OS is doing interesting things with file access and caching and opting out of that has benefits for this particular workload? ... I just skimmed the bsd mailing list email on why grep is fast which was linked up-thread, and it seems that's somewhat the case. It sounds like since they are doing advanced search techniques on what matches or can match, they use mmap to avoid requiring the kernel copy every byte into memory, when they know they only need to look at specific ranges of bytes in some instances. At least that was the case at some point in the past. Finally, when I was last the maintainer of GNU grep (15+ years ago...),
GNU grep also tried very hard to set things up so that the _kernel_
could ALSO avoid handling every byte of the input, by using mmap()
instead of read() for file input. At the time, using read() caused
most Unix versions to do extra copying. P.S. Nice attitude, it earned an upvote from me. Which is probably one reason why your third account has more karma than my first. |
|
So the assumption is that those pages don't even ever get swapped in, but I think that'd only be the case when the pattern size is at least as large as the page size (usually 4KB!), which is not the case in the example in the mailing list. So the mystery continues!