Hacker News new | ask | show | jobs
by 1vuio0pswjnm7 2130 days ago
Apologies for the careless omission. I tested the difference on a larger job; with grep 28s, with fgrep 22s.
1 comments

I think theres probably a sweet spot in how large the files are compared to the method used, because eventually disk access may dominate the running time. Putting files on a ram disk (/dev/shm on some distros) would help.

I tested with files just over 2 MB on a small Digital Ocean VM. Depending on disk speed, based on running time I suspect you ran on files at least an order of magnitude larger. What time did python run in for those? Seeing memory usage from time might be illuminating for these tasks too. Using 4x the disk size in memory is fine for a couple MB file, but less so for a couple GB file (in which case creating a bloom filter or trie might be better, but I really have no idea if Pythons set functions do that already).