| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by thrashh 1066 days ago

I mean you can design a filesystem to handle a million files extremely quickly... it just has to be in the requirements up front.

But there will be some trade-off.

And I don't think people generally put "a million files" in the requirements because it's fairly rare.

3 comments

saltcured 1065 days ago

Not related to git (I hope), but a lot of scientific data/imaging folks seem to think file abstractions are free. I've seen more than one stack explode a _single_ microscope image into 100k files, so you'd hit 1M after trying to store just 10 microscope slides. Then, a realistic archive with thousands of images can hit a billion files before you know it.

It's hard to get people past the demo phase "works for me" when they have played with one image, to realize they really need a reasonable container format to play nice with the systems world outside their one task.

link

bityard 1058 days ago

I was referring to general-purpose filesystems in common use today. Yes, there are a lot of special-purpose and experimental filesystems which are optimized for certain use cases, and a competent systems programmer could write one optimized specifically for small files, but these all have to make significant trade-offs.

link

didgetmaster 1065 days ago

It used to be much more rare in the past. With 20 TB drives available today, it is much more common to be able to handle many more files. When I designed my file system replacement (www.Didgets.com), I didn't just put 'a million files' in the requirement; I put 100x more in it.

Now I have a system that will find subsets in just a second or two (even when the whole set contains hundreds of millions and any given subset might contain hundreds of thousands of matches). Here is a short video of a demo: https://www.youtube.com/watch?v=dWIo6sia_hw

link