Hacker News new | ask | show | jobs
by thrashh 1018 days ago
I mean you can design a filesystem to handle a million files extremely quickly... it just has to be in the requirements up front.

But there will be some trade-off.

And I don't think people generally put "a million files" in the requirements because it's fairly rare.

3 comments

Not related to git (I hope), but a lot of scientific data/imaging folks seem to think file abstractions are free. I've seen more than one stack explode a _single_ microscope image into 100k files, so you'd hit 1M after trying to store just 10 microscope slides. Then, a realistic archive with thousands of images can hit a billion files before you know it.

It's hard to get people past the demo phase "works for me" when they have played with one image, to realize they really need a reasonable container format to play nice with the systems world outside their one task.

I was referring to general-purpose filesystems in common use today. Yes, there are a lot of special-purpose and experimental filesystems which are optimized for certain use cases, and a competent systems programmer could write one optimized specifically for small files, but these all have to make significant trade-offs.
It used to be much more rare in the past. With 20 TB drives available today, it is much more common to be able to handle many more files. When I designed my file system replacement (www.Didgets.com), I didn't just put 'a million files' in the requirement; I put 100x more in it.

Now I have a system that will find subsets in just a second or two (even when the whole set contains hundreds of millions and any given subset might contain hundreds of thousands of matches). Here is a short video of a demo: https://www.youtube.com/watch?v=dWIo6sia_hw