Hacker News new | ask | show | jobs
by yorhel 2766 days ago
I appreciate your work, but you're not being very honest with your claims.

nnn is not keeping information about 400K files in memory in that benchmark. As a result, the rescan is necessary when changing directory. The rescan may be fast in many cases and in some cases it may even be what you'd want, but I can also name many cases where you certainly won't want it (large NFS mounts being one example).

Sorry for the pedantry. I spent a fair amount of time optimizing ncdu's memory usage, so I tend to have an opinion on this topic. :)

1 comments

I think we are saying the same thing in different lingo. I am trying to say, you do not need to store it if you can have fast rescans.

Coming to memory usage, if you store the sizes of every file you need 400K * 8 bytes = ~3 MB.

Now `ncdu` uses ~60 MB and `nnn` uses ~3.5 MB. How do you justify that huge gap?

> but you're not being very honest with your claims

No, I am completely honest within the limits of my technical understanding. Your tool uses 57 MB extra which would be considerable on a Raspberry Pi model B. To an end user, it's not important how a tool shows the du of `/`, what's important is - is the tool reasonable or not? I don't know how `ncdu` manages the memory within, I took a snapshot of the memory usage at `/`.

In fact, now I have questions about your very first line beginning with `This looks incredibly cool` and then the comparisons of it with different utilities in negative light. (I must be a fool realizing it now, I should have seen it coming.)

And I'm saying you can't have fast rescans in all cases - it very much depends on the filesystem and directory structure.

I'm not trying to downplay nnn - I meant it when I said it's a cool project! I'm saying each project has its strengths and weaknesses, but your marketing doesn't reflect that (or I missed it).

ncdu's memory usage is definitely its weak point - that's not news to me - but it's because I chose the other side of the trade-off: No rescans. If you're curious where that 60MB goes to, it's 400K of 'struct dir's: https://g.blicky.net/ncdu.git/tree/src/global.h?id=d95c65b0#...

I honestly don’t understand why you’re getting down voted when all you’re doing is explaining the design decisions behind your own utility.
You’re being very snarky considering how quick you to start the debate, where you boasted about how much better optimised your tool was in your GitHub README.

I grow rather tired of comparisons where one tool tries make itself look better than another based purely on a solitary arbitrary metric like memory usage. It’s not a fair benchmark and really it’s just an excuse to make yourself look better by bad mouthing someone else’s code.

What’s to say the other tools haven’t employed the same algorithms you vaguely stipulated you had (I say “vaguely because you don’t even state which highly optimised algorithms you’ve use)? Have you read the source code of the other projects you’ve commented on to check they’re not doing the same - nor even better? Because your README is written like you’re arguing that other projects are not optimised.

What’s to say that the larger memory usage (which is still peanuts compared to most file managers) isn’t because the developer optimised performance over literally just a few extra KB of system memory? Their tool might run circles around yours for FUSE file systems, network mounted volumes, external storage devices with lower IOPS, etc.

But no, instead you get snarky when the author of one of the tools you were so ready to dismiss as being worse starts making a more detailed points about practical, real world usage beyond indexing files on an SSD.

It wasn't a debate. You asked, I answered. And if you read carefully, there is _not_a_single_comment_ on the quality of a single other utility in the README. We recorded what we saw and I have shared the reason why.

I am not going to respond any further and would appreciate it if you refrain from getting personal with "not being completely honest", "being very snarky" etc. Please don't judge me by the project page of a utility which is a work of several contributors. That's all.

I’m not related to the GP.

Let me explain the point further:

Your readme has a performance section, that section focuses on nnn vs two other tools. You only benchmark against Memory usage under normal circumstances (ie no other performance metric, no other file system nor device types, etc). Then you have a whole other page dedicated to “why is nnn so much smaller” which is directly linked to from the performance comparisons. There’s no other way to take that other than you’re directly comparing nnn to other tools and objectively saying it’s better.

So with that in mind, I think the developers of the other tooare totally with in their right to challenge you on your claims.

Edit: the “multiple contributors” point you made is also rather dishonest too. It’s your personal GitHub account for a project you chiefly contribute too and the documents in question were created and edited by yourself (according to git blame). Yes nnn has other contributors too but it was yourself who wrote and published the claims being questioned.

> totally with in their right to challenge you on your claims

Yes, and within the limits of common courtesy.

The other utility does only one thing - reports disk usage so there's not much to compare. The dev did mention that `ncdu's memory usage is definitely its weak point`.

> no other performance metric, no other file system nor device types

because lstat64() is at the core of the performance metric of the feature we are comparing here and with the same number of files on the same storage device the number of accesses are exactly the same. The only metric that differentiates the utilities is memory usage.

> Edit: the “multiple contributors” point you made is also rather dishonest too.

Not really, I prefer to edit the readme myself because I want to keep the documentation clean. You will see features contributed by other devs for which I have written the docs from readme to manpage. Regarding the metrics, sometimes I have taken the data and sometimes I have requested someone else to collect it. Or doesn't that count as contribution?

What I actually care most about a file manager is how they perform on mounts with low IOPS and how gracefully they handle time outs and other exceptions.

RAM is cheap and any file manager will be snappy on an SSD. But edge cases are where most file managers fall apart yet are situations where you might need to depend on your tools the most.

However now I understand the point of this project was purely to optimise against memory usage, I can better understand the arguments you were making.

> or doesn't that count as contribution?

Not in this case, no. You published it, so you’re still ultimately accountable for it.

You cannot request figures then play the “nothing to do with me guv’” card when someone queries the benchmarks that you subsequently published. At best it comes across as an unlikely story; at worst you’re still complicit.

Your calculation (`400K * 8 bytes = ~3 MB`) is way off. What would be the point of storing only the size? You need to map it back to the file.

60MB gives you about 150 bytes for file path or file name and its size, which sounds plausible.

Maybe you shouldn't store the file path but just the name, and a parent pointer. That brings you down to 8 bytes size + Parent pointer + a short string. Regarding the string you can go for offsets into a string pool (memory chunk containing zero terminated strings).

So I think 50 bytes per file is easy to accomplish if (name/parent/path) + size is all you want to cache. For speed-up, I would add another 4 or 8 bytes index to map each directory to its first child.

I can think of at least 3 possible algorithms to use much much less memory even with a static snapshot of the complete subtree. And all of them are broken because the filesystem is supposed to stay online and change. It's realistically useful to scan an external disk to find the largest file etc., but not accurate on a live server, a desktop with several ongoing downloads, video multiplexing etc.
Earlier in the thread you suggested that it's hard to justify ncdu using 60MB while it takes only 3.5MB to store 400K * 8 bytes numbers. The number you came up with is just silly and overlooks actual complexity of the problem.

Given that you are making an implicit judgement about the other program, don't be sloppy with your estimates.

> don't be sloppy with your estimates

I'm not. You can, and I'm sure eventually you will arrive very close to the approximation.

I'd been a big time fan of `ncdu` for years and even wrote in an e-journal about it once. Maybe that's why the sharp adjectives became more difficult to digest. Anyway, good luck!