Hacker News new | ask | show | jobs
by burntsushi 659 days ago
I'm not aware of one. Any tool that tells you disk space has to actually crawl the directory tree to report it. But that is precisely the thing we want to parallelize.

The only other option I can think of is to dynamically adjust. Maybe after a certain amount of work has completed, spin up more threads. But I'm not sure it's worth doing.

1 comments

Looking at inode metadata—specifically the number of links for directory nodes—might iteratively provide a one-step-ahead view of what's left to crawl, allowing for preemptive thread adjustments during recursion.

e.g. looking at the Links: 101 metadata on the `curl` codebase for src:

  $ stat -x src
  
    File: "src"
    Size: 3232         FileType: Directory
    Mode: (0755/drwxr-xr-x)         Uid: (  501/    alex)  Gid: (   20/   staff)
  Device: 1,22   Inode: 5857579    Links: 101
  Access: Tue Aug 27 22:21:23 2024
  Modify: Tue Aug 27 22:21:19 2024
  Change: Tue Aug 27 22:21:19 2024
   Birth: Tue Aug 27 22:21:19 2024
But then that still involves dynamically adjusting and might be kind of overkill for a relatively uncertain benefit...