Hacker News new | ask | show | jobs
by kbenson 2124 days ago
That grep is not doing the same thing as the code, nor necessarily what the exercise requires. By default, grep tests patterns, so it's turning all those entries into individual regular expressions. You want to use fgrep, or the -F flag to make it treat all the source matches as fixes strings.

In my simple test, that resulting in grep running in 44% of the prior amount of time it required (still more than python though).

1 comments

Apologies for the careless omission. I tested the difference on a larger job; with grep 28s, with fgrep 22s.
I think theres probably a sweet spot in how large the files are compared to the method used, because eventually disk access may dominate the running time. Putting files on a ram disk (/dev/shm on some distros) would help.

I tested with files just over 2 MB on a small Digital Ocean VM. Depending on disk speed, based on running time I suspect you ran on files at least an order of magnitude larger. What time did python run in for those? Seeing memory usage from time might be illuminating for these tasks too. Using 4x the disk size in memory is fine for a couple MB file, but less so for a couple GB file (in which case creating a bloom filter or trie might be better, but I really have no idea if Pythons set functions do that already).