| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by glangdale 2307 days ago

Any good literal search algorithm could do a one-off search for a single long literal 'needle' way faster than the roughly 1GB/s that the author attained with grep.

A single string of that length is extremely easy to search for with a range of different algorithms - I would be surprised if a decent approach couldn't keep up with memory bandwidth (assuming your 22GB file is already, somehow, in memory). The mechanics of simply reading such a big file are likely to dominate in practice.

We implemented some SIMD approaches in https://github.com/intel/hyperscan that would probably work pretty well (effectively a 2-char search followed by a quick confirm) for this case.

Of course, that begs the question - presupposing that any kind of whole-text search is actually the answer to this question. The end result - assuming that you really do have more than a few searches to do - of keeping the results in any kind of prebuilt structure - is way superior to an ad hoc literal search.