Hacker News new | ask | show | jobs
by propa 2762 days ago
Your calculation (`400K * 8 bytes = ~3 MB`) is way off. What would be the point of storing only the size? You need to map it back to the file.

60MB gives you about 150 bytes for file path or file name and its size, which sounds plausible.

2 comments

Maybe you shouldn't store the file path but just the name, and a parent pointer. That brings you down to 8 bytes size + Parent pointer + a short string. Regarding the string you can go for offsets into a string pool (memory chunk containing zero terminated strings).

So I think 50 bytes per file is easy to accomplish if (name/parent/path) + size is all you want to cache. For speed-up, I would add another 4 or 8 bytes index to map each directory to its first child.

I can think of at least 3 possible algorithms to use much much less memory even with a static snapshot of the complete subtree. And all of them are broken because the filesystem is supposed to stay online and change. It's realistically useful to scan an external disk to find the largest file etc., but not accurate on a live server, a desktop with several ongoing downloads, video multiplexing etc.
Earlier in the thread you suggested that it's hard to justify ncdu using 60MB while it takes only 3.5MB to store 400K * 8 bytes numbers. The number you came up with is just silly and overlooks actual complexity of the problem.

Given that you are making an implicit judgement about the other program, don't be sloppy with your estimates.

> don't be sloppy with your estimates

I'm not. You can, and I'm sure eventually you will arrive very close to the approximation.

I'd been a big time fan of `ncdu` for years and even wrote in an e-journal about it once. Maybe that's why the sharp adjectives became more difficult to digest. Anyway, good luck!