Hacker News new | ask | show | jobs
by setquk 2942 days ago
Grep 150,000 source files on Linux and on WSL and come back to me. That's a pretty standard console load. It kills git operations, package managers, compilers, the lot. ALL the tools I use it for. Well USED it for.

Real data:

linux -> 1.3 seconds, all from buffer cache on a cranky old 10 year old HP desktop with 8 gig of RAM and bottom end SSD.

wsl -> over a minute on a 12 core i7 with high end m2 SSD, every time.

This is because of NTFS's awful performance on small files. The whole of Unix is file based and uses huge numbers of small files, as does source code generally so this is an end game scenario for the platform. It simply sucks!

This goes back to when we had SVN which would take 6-7 minutes to check a repo out onto NTFS versus 20-30 seconds onto ext4 on the same rust disks. SVN was treated like cancer by the organisation for what is fundamentally a platform limitation.

2 comments

As the maintainer of ripgrep, I pay attention to these sort of things. I will say that I noticed a similar performance problem in my tests as well, but further investigation revealed that Window's antimalware process was severely throttling file reads. Once I disabled that, performance on Windows is nowhere near an order of magnitude worse than Linux.
Thanks for commenting. I have disabled windows defender on the machine and tweaked the filesystem with fsutil and it is still running at over 40 seconds on the same workload.
Source code files are small... but hardly a lot though. I've usually been doing extensive search on big files, and ripgrep is great for that (and available for windows).
I just use `git grep` which isn't as fast as ripgrep (https://github.com/BurntSushi/ripgrep), but is still damn fast (on Windows).

  $ find ~/src | wc -l
  242341
Maybe, maybe not.