I think theres probably a sweet spot in how large the files are compared to the method used, because eventually disk access may dominate the running time. Putting files on a ram disk (/dev/shm on some distros) would help.
I tested with files just over 2 MB on a small Digital Ocean VM. Depending on disk speed, based on running time I suspect you ran on files at least an order of magnitude larger. What time did python run in for those? Seeing memory usage from time might be illuminating for these tasks too. Using 4x the disk size in memory is fine for a couple MB file, but less so for a couple GB file (in which case creating a bloom filter or trie might be better, but I really have no idea if Pythons set functions do that already).
I tested with files just over 2 MB on a small Digital Ocean VM. Depending on disk speed, based on running time I suspect you ran on files at least an order of magnitude larger. What time did python run in for those? Seeing memory usage from time might be illuminating for these tasks too. Using 4x the disk size in memory is fine for a couple MB file, but less so for a couple GB file (in which case creating a bloom filter or trie might be better, but I really have no idea if Pythons set functions do that already).