| > The processor wouldn’t cache file system meta data What _file system meta data_? The processor doesn't care what data it is! I am talking of the hw control plane and you are still lurching at pure software constructs like the kernel and its drivers. All the CPU cares is the latest data fetched by a program. The CPU cache exists to store program instructions and _data_ (no matter where it comes from) used repeatedly in the operation of programs or information that the CPU is likely to need next. If the data is still available in the cacheline and isn't invalidated, CPU won't fetch it from an external unit (so bus request is not even made). _Any_ data coming to CPU sits in any Ln cache, source notwithstanding. The external memory is accessed in case of cache misses. However, the metadata these utilities fetch is very very less and the probability is greatly reduced. Moreover, your hypothetical utility also banks on the assumption that this data won't change and it wouldn't have to issue too many rescans to remain performant. It's the same thing you see when you copy a 2 GB file from a slow storage to SSD and the first time it's slow but the next time it's way faster. You can see it for yourself. Run `nnn` on any external disk you are having (with a lot of data preferably), navigate to the mounpoint, press `^J` (1. notice the time taken), move to a subdir, come back to mountpoint again (2. notice the time taken). You would see what I mean. > none of the hobby projects I’ve written in FUSE had any caching On a side note (and though not much relevant here), all serious drivers (e.g. those from Tuxera, btrfs) maintain buffer cache (https://www.tldp.org/LDP/sag/html/buffer-cache.html). They always boost performance. If our Ln misses, this is where we would get the metadata from and _hopefully_ not from the disk which is the worst case. |
However regardless of where that cache is, it’s volatile, it’s freeable. And thus you cannot guarantee it will be there when you need it. Particularly on systems with less system memory (remember that’s one of your target demographics).
If you wanted though, you could easily check if a particular file is cached and if not, perform the rescan. I can’t recall the APIs to do that off hand but it would be platform specific and not all encompassing (eg ZFS cache is separate to the kernels cache on Linux and FreeBSD) so it wouldn’t be particularly reliable nor portable. Plus depending on the syscall used, you might find it’s more overhead than an actual refresh on all but a rare subset of edge cases. As an aside, this is why I would build my own cache. Sure it cost more RAM but it was cheaper in the long run - fewer syscalls, less kernel/user space memory swapping, easier to track what is in cache and what is not, etc. But obviously the cost is I lose precision with regards to when a cache goes stale.
While on the topic of stale caches, I’ve actually ran into issues on SunOS (or was it Linux? I used a lot of platforms back then) where contents on an NFS volume could be updated on one host but various other connected machines wouldn’t see the updated file without doing a ‘touch’ on that file from those machines. So stale caches is something that can affect the host as well.