Hacker News new | ask | show | jobs
by kbenson 4433 days ago
Possibly the OS is doing interesting things with file access and caching and opting out of that has benefits for this particular workload?

...

I just skimmed the bsd mailing list email on why grep is fast which was linked up-thread, and it seems that's somewhat the case. It sounds like since they are doing advanced search techniques on what matches or can match, they use mmap to avoid requiring the kernel copy every byte into memory, when they know they only need to look at specific ranges of bytes in some instances. At least that was the case at some point in the past.

Finally, when I was last the maintainer of GNU grep (15+ years ago...), GNU grep also tried very hard to set things up so that the _kernel_ could ALSO avoid handling every byte of the input, by using mmap() instead of read() for file input. At the time, using read() caused most Unix versions to do extra copying.

P.S. Nice attitude, it earned an upvote from me. Which is probably one reason why your third account has more karma than my first.

1 comments

Right, I think the point of boyer-moore is that it allows to eliminate / skip large chunks of the text during the search.

So the assumption is that those pages don't even ever get swapped in, but I think that'd only be the case when the pattern size is at least as large as the page size (usually 4KB!), which is not the case in the example in the mailing list. So the mystery continues!