Hacker News new | ask | show | jobs
by jerf 1267 days ago
About once every year or two, I remember something I read maybe 15 years ago and realize it would be absolutely perfect for some reason, but I can't find it. And being an engineer and an HN poster, my brain immediately leaps to "Oh, if only I ran a system that archived everything I browsed so I could build my own personal search engine that could search on just what I've ever looked at."

Then I smack it down, because that is a crap-load of effort to recall a link every year or two. And let's be honest, the marginal value of that link isn't all that great either... in the moment the need may seem large, but sitting here typing about this I couldn't tell you even a single such thing I've forgotten about, because that's how important they are... just more ephemera in the stream themselves.

My MP3 collection is a bit of a mess. I've cleaned up the worst instances of "Band, The" "The Band" "Band" "Band - The" sorts of duplication, but that's about it. My book collection is similarly messy. Heck, even my family photos are basically sorted only by year and not much else. So what? I can fix it. I can fix it all. But it's hard to even so much as recover the time I'd put into it once over, let alone in multiples.

(Much more important, especially for the family photos, is not losing them. So I've got a backup solution. But it's just a fire-at-directory solution, not all gloriously organized by type either.)

So I've learned to just sort of let the desire to have greater organization pass over me, Litany-against-Fear style. It's just a siren call.

4 comments

Photos are the worst.

I'm through at least three complete reorganizations where I even dusted off old backups and collected all photos (because I felt I was deleting photos too liberally last time), de-duplicated (and de-quadruplicated) them all, and built the new "forever" structure.

It sucks that I just know I'll do it again at most five years from now.

Photos are a good example of why a filesystem hierarchy is insufficiently expressive. You might want to search for pictures of me at parties, or pictures with me and my wife, or pictures from 1999, or pictures of LA, and the same photo might belong in all of those searches. No single category will ever be a good place for a photo.
That's a good example because it leads to truth that NO up-front tagging will ever anticipate all the searches you might make in future. There are so many possible searches.

I figure that a combination of wetware and software is the current sweet spot. My brain usually has enough associations and context to turn every photo search into a time or place filter - "I think it was downtown last year" or "some time in summer at home" or "it had my wife in it". The photo storage system need only provide search/filter on date and place to narrow it down to a few hundred thumbnails, plus machine-learning to tag people. Which is basically what iOS provides, no more, no less.

Any other up-front categorization or tagging is basically wasted effort.

For me, I wrote a bulk tool that renames my photo file names by reverse geocoding the GPS information via Open Street Maps. That way I can do text search for place, as well as 2d map search. It's at https://unto.me

I’ve been looking for a good tagging system for image and video files, something I can use to quickly and easily go through a stack of files and tag them, then search by tag later. Bonus points for being able to recompress on the way through, since phones seem to have terrible compression ratios compared to offline compressors.
This is what I want machine learning and face recognition for.

Problems local to my machine, not Orwellian nightmares.

That does seem like a really good solution. Google Photos will prompt you, if I recall correctly, to identify a few faces, and then automatically id the rest. That's fantastic, if you don't have to worry about putting their privacy in Google's hands.
Adobe Lightroom has it. Unfortunately it is quite expensive for the casual photographer.
I personally agree with your point (and find the loose textual search offered by phones these days to be mostly adequate).

But reading your comment gave me a thought: filesystem hierarchies are indeed insufficient, but what about filesystem hierarchies with liberal use of hardlinks?

That seems equivalent to a graph to me, and yes, I'm unaware of any kind of search that a graph does not permit. Indeed it could be the basis for a system that, in my opinion, would dominate any of the existing knowledge graph / tool for thought products. It would consist of three more pieces:

  * A database for backlinks. (Links from file X to file Y would only be possible when X has an appropriate file format -- `.txt`, `.md`, `.org`, etc.)

  * A search grammar with the following primitives:

    * find children of (links from) query results
    * find parents of (links into) query results
    * take the disjunction (OR) of queries
    * take the conjunction (AND) of queries
    * group queries with parentheses

  * The ability to pipe files found via ordinary shell commands into that grammar.
Given the size of most peoples' knowledge graphs, you wouldn't even need to keep a text index (ala Lucene) -- `find` and `grep` would be more than sufficient.
Or tag based, such as https://www.tagspaces.org/
For me photos are solved problem now. I no longer do any cleanup on them, just assume that Apple AI will show me best photos when I search for them. I think that it is simply good enough already.
I put all my photographs in directories named year/year-month/year-month-day. For instance:

    ~/pictures/2022/202212/20221225
And I tag them with as many tags as I can be bothered with using XnView. XnView lets me find pictures by name or by tag.
Digital photos are easy to organize... I store everything in d:\masterarchive\yyyy\yyyymmdd\ folders, and have since 1997

Tagging them with embedded IPTC tags is the way to go. DigiKam works (mostly) as a substitute for the late Picassa. (I'd use that, but the last version has a nasty bug in that it sometimes swaps faces in the recognition database, which then tends to corrupt it all).

The major problem I've had is that in the beginning I didn't really have enough free disk space to keep up. That is no longer an issue, nor is it likely to be again.

this calls for an AI that recognizes pictures and categorizes them for us. what have we been solving all these captchas for?

photos and pictures organization should be a solved problem.

Google Photos search function leverages some automatic categorization, it's pretty good. I wish I had some way to run something at that level over my Lightroom 5 catalog.
Does your smartphone not already do that for you? iPhone does, I think Android does and I think iPhoto on macOS does as well. It wouldn’t surprise me if Google online photos or Facebook do also.

(That is, let you search using words for things in the photo or themes like “winter”).

Bookmarks are a good middle ground between saving everything and not being able to find something you read later. It is integrated in the browser's address bar as autocompletion/search and if you can vaguely recall some words from the title you can find it. I've been using this system for years, works great with Firefox sync and Firefox on Android too (to remember articles I discover on both mobile and desktop in same place)
> My MP3 collection is a bit of a mess. I've cleaned up the worst instances of "Band, The" "The Band" "Band" "Band - The"

MusicBrainz Picard cleans and labels your music automatically using sound signatures, even when the file has no metadata. You can just give it your files and let it run, rarely have I felt the need to monitor it. It gets stuff right 99% of the time, the rest 1% is easily fixable whenever you come across it.

> "Oh, if only I ran a system that archived everything I browsed so I could build my own personal search engine that could search on just what I've ever looked at."

Well, that makes two of us.

Hey, I’ve been building that. It’s called A Personal Search Engine: https://apse.io
Oh! That is neat. Based on screenshots? Wouldn’t have thought of that.

I have many questions, about backup and disk space. I’m going to give it a try.

Thanks!