I would imagine the problem to be very similar in spirit to developing a Huffman code given symbol probabilities. Given the distribution of file access probabilities, the number of scans+clicks in the folder hierarchy is captured by the encoding string length/cost. (But I’m being a bit loose)
“Folder navigation is the main way that personal computer users retrieve their own files.”
Maybe in 2010, but is that true now? Is that true of younger users? I personally use find all my files through search (Alfred on macOS). And I know many others who do too.
“Folder structures were found to be shallow (files were retrieved from mean depth of 2.86 folders)”
Again I don’t know if this is still true. When you have to manually click through each folder, then shallow structures make sense. But when you can search into all directories at once, then this starts becoming less of a constraint. And instead you gain the benefits of deep structures without the negatives (I.e. cd’ing through 10 folders).
"strong preferences for navigation over search" is surprising to me.
For email, Inbox Zero means everything is basically in Archive, and is findable by successive application of filters.
For files, the requirements of a "project", ie. related files in one repo, means I need to traverse directories.
Mapping memory parts by attributes is way easier for me (file was written in SCSH, something something Pascal's Triangle) than mapping to hierarchies (is that in pkg/scsh or pkg/cs101 or ...).
Hard to believe other people have super good mapping of memories to file hierarchies. Or maybe just repo-based hierarchies is dumb?
> For email, Inbox Zero means everything is basically in Archive, and is findable by successive application of filters.
No. Inbox Zero means that almost nothing is in the inbox. I practice Inbox Zero, but I don't put everything into Archive. The main kind of mail that I put into Archive is personal messages (hard to categorize them). I put finance-related mails into a separate folder, newsletters into a separate folder, German classes-related mails into a separate folder, healthcare-related mails into a separate folder, job-hunt-related mails into a separate folder and mails related to searching for a flat to rent into a separate folder. Archive is the default, when a certain kind of mail is not expected to generate a lot of mails, or it's hard to categorize something in general. I do use search, but the folders help in searching too. I guess I use them more like tags than folders. I've considered setting up notmuch with notmuchsync, but haven't got around to it yet.
As for organizing files on the filesystem, I do have mildly-deep hierarchies. One nice thing about folders as tags, is that you can construct a hierarchy which means that tag A implies tag B. So e.g. you don't need to tag something as "finance", if you already tagged it as "taxes".
As for preference for navigation over search, it's understandable to me, because by successively going into deeper folders you don't need to remember each "tag". The contents of the folders you browse remind you about what tags are available. You don't need to know all the most-specific tags right off the bat and you're not presented with a zillion of them at any time. You gradually get more specific based on context, at any one time presenting you with only a handful of tags to choose which are relevant at a given time.
Yeah, it seems a lot of this reduces to preferences.
FWIW, my email has just INBOX, Archive, and Todo. When documenting, I will log the message-id of an email, so that I can just query on that later. For me, this feels like a don't-make-me-think kind of setup, where I just swipe left-or-right through my inbox (which is ~100 emails/day) and then process Todo into Archive.
Search for files on computers was always clumsy and time consuming (it still is for reasons that escape me TBH). No wonder people preferred navigation. Nowadays most people that need a personal archive surely have a note-taking system that link files into a folder/tag structure with much better search capabilities than operating systems ever had (including free-text search).
The results of this research would have been much in favor of search over navigation these days I suspect.
Interesting article, but I love the precision of the formula on page 24: e.g. retrieving a document if there are no sub folders takes 4.956 seconds. Not 5 seconds. Those 4 missing milliseconds are really noticeable for a user. What std error 0.71?
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33...