Hacker News new | ask | show | jobs
by laumars 2763 days ago
What I actually care most about a file manager is how they perform on mounts with low IOPS and how gracefully they handle time outs and other exceptions.

RAM is cheap and any file manager will be snappy on an SSD. But edge cases are where most file managers fall apart yet are situations where you might need to depend on your tools the most.

However now I understand the point of this project was purely to optimise against memory usage, I can better understand the arguments you were making.

> or doesn't that count as contribution?

Not in this case, no. You published it, so you’re still ultimately accountable for it.

You cannot request figures then play the “nothing to do with me guv’” card when someone queries the benchmarks that you subsequently published. At best it comes across as an unlikely story; at worst you’re still complicit.

3 comments

>RAM is cheap

This is the wrong mindset. RAM is only cheap if you don't use it. As soon as you go just 1 byte over the maximum RAM it turns into the most precious resource of the entire computer.

If an app uses more memory than another then it is not better because RAM is cheap. It is better because it provides more or higher quality features at a reasonable cost of increased memory usage. But at the same time it is also worse for people who do not need those features.

Here is an example: When I launch the graphical file manager nautlius it consumes roughly 26 MB of RAM showing my home folder but when I go to my "Pictures" folder it suddenly shoots up to 300MB. There is a reason for that and it is not "RAM is cheap", if that were the case it would always use 300MB regardless of what I do with it (electron apps are a major offender of this). Nautlis consumes that much RAM because it has more features like displaying 1000 thumbnails of all those pictures.

Now this feature would get in my way if I set up something like a Raspberry Pi Zero to take a photo every hour. Nautilus will crash because it needs too much memory to display the thumbnails.

I agree from an idealistic point of view (I've often make the same argument myself with regards to non-native applications and self-indulgent GUIs) but you're missing the context of the argument here.

We're not talking about burning through hundreds of megabytes (nor even gigs) on a pretty UI that adds no extra function; we are talking about only a few megabytes to save stressing low bandwidth endpoints.

It isn't 1990 any more, sacrificing a few megabytes of RAM in favour of greater stability is very much a worthwhile gain in my opinion. Hence why I say RAM is cheap - we don't need to cut every corner just to save a few bytes here and there.

In the example we were discussing, the idle RAM usage was higher because it doesn't hammer low bandwidth endpoints with frequent rescans. Caching is actually a pretty usage for spare RAM - your kernel does it too. So we are not talking about anything out of the ordinary here and we're certainly not talking about wasting RAM for the sake of wasting RAM. We're talking about trading some cheap RAM for the sake of potentially expensive (computationally speaking) access times.

However I do feel I need to reiterate a point I've said throughout this discussion: there is no right or wrong way; it's just a matter of compromise. eg the goals for an embedded system would be different to the goals for an CLI tool on a developers laptop.

> then play the “nothing to do with me guv’” card when someone queries the benchmarks that you subsequently published

No, you are cooking things up. I did respond as per my understanding. My statement was very clear - "_Please don't judge me_ by the project page of a utility which is a work of several contributors."

I had problems with the _personal remarks_. And I am not surprised you chose to ignore that and describe it in the light that I have problem with someone challenging the benchmarks.

I am yet to come across figures that can challenge the current one.

I have no idea how people convince themselves to contribute to open source and/or participate regularly in online discourse. It’s basically working really hard for the easiest-to-offend and least-likely-to-appreciate-anything-you-do people on earth, so they can throw shade at everything you do, and then get mad because you did something different than they would have done.

FWIW, I appreciate everything you all are doing, even the stuff I’m not using right now. You all don’t hear it enough, and you’re certainly not getting paid enough for what you’re doing.

I contribute loads to open source too.

It’s actually not that hard to do so without talking trash about other projects. In fact I find those kind of comparisons are often the laziest and least researched ways of promoting a particular project as anyone who’s spent any time dissecting other peoples work in detail (not just running ‘ps’ in another TTY) will usually gain an appreciation for the hard work and design decisions that have gone into competing solutions.

But that’s just the opinion of one guy who has been contributing to the open source community for the last two decades. ;)

> talking trash about other projects

Cleverly fabricated and blown out of proportions. Yes, 2 decades of rich experience sure teaches that!

To clear the context for others, you are talking about a list of performance numbers here, and as I said, I am yet to come across figures that can challenge the current one.

The author of the other project did challenge them though. Hence this entire thread ;)
Yes, and he has received pointers on how he can reduce memory usage in his program (an issue he mentioned exists).
> What I actually care most about a file manager is how they perform on mounts with low IOPS

Purely technical questions (let's put `being very snarky` and `dishonesty` and my other irrelevant personal traits aside):

- How does "low IOPS" affect readdir()/scandir() and lstat64() in two C utilities differently?

- What else would be affected?

It’s about how and when they get used. Eg if you’re running on a lower performing mount then you might want to rely on caches more than frequent rescans. What I would often do on performance critical routines running against network mounts was store a tree of inodes and time stamps and only rescan a directory if the parents time stamp changed. It meant I’d miss any subsequent metadata changes to individual files (mtime, size, etc) but I would capture any new files being created. Which is the main thing I cared about so that was the trade off I was willing to make.

There’s no right or wrong answer here though. Just different designs for different problems. Which was also the point the other developer was making when he was talking about his memory usage.

> There’s no right or wrong answer here though.

The processor cache plays an important role which you are ignoring.

External storage devices: most of the time they are write-back and even equipped with read-ahead. Yes, I know there are some exceptions but if you are write-through non-read-ahead you _chose_ to be slow in your feedback already and this discussion doesn't even apply.

Network mounts: cache coherency rules apply to CIFS as well. And again, if you _choose_ to ignore/disable, you are OK to be slow and this discussion does not apply.

If `nnn` take n secs the first time, another utility will take around the same time on the first startup (from a cold boot).

Now the next scans where you go into subdirs would be much faster even in `nnn` due to locality of caching of the information about the files (try it out). The CPU cache already does an excellent job here. And if you go up, both `nnn` and the other utility would rescan.

> point the other developer was making

Yes, he was saying - my memory usage may be 15 times higher because of storing all filenames (in a static snapshot!!!) but you are dishonest if you show the numbers from `top` output without reading my code first for an education of my utility.

I’m not sure what you mean by processor cache here. The processor wouldn’t cache file system meta data. Kernels will, but that is largely dependant on the respective file system driver (eg none of the hobby projects I’ve written in FUSE had any caching).

Different write modes on external hardware also confuses the issue because you still have the slower bus speeds (eg a USB2 for an older memory stick) to and from the external device than you might have with dedicated internal hardware.

> The processor wouldn’t cache file system meta data

What _file system meta data_? The processor doesn't care what data it is! I am talking of the hw control plane and you are still lurching at pure software constructs like the kernel and its drivers.

All the CPU cares is the latest data fetched by a program. The CPU cache exists to store program instructions and _data_ (no matter where it comes from) used repeatedly in the operation of programs or information that the CPU is likely to need next. If the data is still available in the cacheline and isn't invalidated, CPU won't fetch it from an external unit (so bus request is not even made). _Any_ data coming to CPU sits in any Ln cache, source notwithstanding. The external memory is accessed in case of cache misses. However, the metadata these utilities fetch is very very less and the probability is greatly reduced. Moreover, your hypothetical utility also banks on the assumption that this data won't change and it wouldn't have to issue too many rescans to remain performant.

It's the same thing you see when you copy a 2 GB file from a slow storage to SSD and the first time it's slow but the next time it's way faster.

You can see it for yourself. Run `nnn` on any external disk you are having (with a lot of data preferably), navigate to the mounpoint, press `^J` (1. notice the time taken), move to a subdir, come back to mountpoint again (2. notice the time taken). You would see what I mean.

> none of the hobby projects I’ve written in FUSE had any caching

On a side note (and though not much relevant here), all serious drivers (e.g. those from Tuxera, btrfs) maintain buffer cache (https://www.tldp.org/LDP/sag/html/buffer-cache.html). They always boost performance. If our Ln misses, this is where we would get the metadata from and _hopefully_ not from the disk which is the worst case.

Yeah, that’s the kernel caching that (as I described) not some hardware specific thing the CPU is doing. Not disagreeing with you that L1 and L2 cache does exist on the CPU (and L3 in some instances) but it is much too small to hold the kind of data you’re suggesting. It’s really the kernel freeable memory (or file system driver - eg in the case if ZFS) where file system data - inc file contents too - will be cached. The CPU cache is much more valuable for application data to fill with file system meta data (in my personal opinion, you might override that behaviour in nnn but I’d hope not)

However regardless of where that cache is, it’s volatile, it’s freeable. And thus you cannot guarantee it will be there when you need it. Particularly on systems with less system memory (remember that’s one of your target demographics).

If you wanted though, you could easily check if a particular file is cached and if not, perform the rescan. I can’t recall the APIs to do that off hand but it would be platform specific and not all encompassing (eg ZFS cache is separate to the kernels cache on Linux and FreeBSD) so it wouldn’t be particularly reliable nor portable. Plus depending on the syscall used, you might find it’s more overhead than an actual refresh on all but a rare subset of edge cases. As an aside, this is why I would build my own cache. Sure it cost more RAM but it was cheaper in the long run - fewer syscalls, less kernel/user space memory swapping, easier to track what is in cache and what is not, etc. But obviously the cost is I lose precision with regards to when a cache goes stale.

While on the topic of stale caches, I’ve actually ran into issues on SunOS (or was it Linux? I used a lot of platforms back then) where contents on an NFS volume could be updated on one host but various other connected machines wouldn’t see the updated file without doing a ‘touch’ on that file from those machines. So stale caches is something that can affect the host as well.