| > most files are nowhere near 10 levels deep, let alone hundreds of levels. Filesystem (and low-level in general) stuff must consider worst cases. There is a lot of software out there doing weird things. For instance, npm created a very deep folder hierarchy for a long time (so deep it messed up some path length restrictions in fact). One way or another the worst case is going to be hit. And then what? The entire computer grinds to a halt? How is the user supposed to discover why? > We can amortize the cost by updating the parent folder sizes only when the file size is changed by a significant amount since last update (say, 10%). So you have a log file inside a folder. Because it just keeps growing line by line then the folder's size is never updated. Now you have many Gb's of log file in that folder, and the folder says it is using "4Kbytes". Moreover, this propagates upwards the filesystem. In the end, your root drive has a "folder size" of X but its actual usage is Y >> X. How is that not going to confuse everyone? Put in another way, who is going to trust the X number ever? Why would you pay all that accounting penalty for every write to every file to end up with a half-asset broken-by-design feature? > Also I don't see how drives and hard linked files are any different than regular files in this context. They are different in that they exist in multiple folders at the same time (so they would trigger multiple size-updating branches). |
You misunderstood what I meant. I didn't mean we should only update if a single change is significant. I meant we only update when the cumulative changes since last update is significant. An example:
Let's say we create a 1mb file. The next time the file is changed, we only update the parent if the change is more than 10% (the new size is greater than 1.1mb or less than 0.9mb). For example if it is a log file and each line is 100 bytes, we update the parent after 1000 new lines (even though the 1000th line is still only 100 bytes). (of course the number of lines before the update depends on the size of the file, so if we had a 1GB log file, we would update after 1000000 new lines).
It is trivial to prove that the estimate is never off by a factor of more than 10% (even for the parent folders). So this is not a "half-assed broken-by-design feature" since it provides strong guarantees in bounds and at least in my personal day-to-day usage, I almost never care about the exact size of a folder but want to have a rough idea of how large it is.