Hacker News new | ask | show | jobs
by mhx77 2026 days ago
Files that you've accessed will be kept in the kernel's cache. The cache I was talking about is a cache for decompressed blocks. Single files can stretch across multiple blocks, so you need to be able to keep more than one in memory anyway. However, decompressed files are kept in the cache in the hope that further (or even concurrent) reads will access the same blocks. Taking the example from the README where over a 1000 perl binaries are being executed concurrently, that cache typically has hit rates of 99+%:

  $ dwarfs perl-install.dwarfs mnt -f
  23:02:42.673390 dwarfs (0.2.1)
  23:02:42.676663 file system initialized [1.94ms]
  23:02:49.210158 blocks created: 226
  23:02:49.210189 blocks evicted: 194
  23:02:49.210216 request sets merged: 123
  23:02:49.210241 total requests: 50056
  23:02:49.210270 active hits (fast): 1515
  23:02:49.210293 active hits (slow): 833
  23:02:49.210318 cache hits (fast): 47482
  23:02:49.210343 cache hits (slow): 0
  23:02:49.210392 fast hit rate: 97.8844%
  23:02:49.210417 slow hit rate: 1.66414%
  23:02:49.210441 miss rate: 0.451494%
For example, reducing the cache size from 512M (default) to 32M increases the time it takes to run 1139 binaries from 2.5 seconds to almost 40 seconds.
1 comments

Ah, I see. So this specifically saves the decompression time for data you've already decompressed, if another file references the same data?
Precisely.