Hacker News new | ask | show | jobs
by dontwearitout 946 days ago
Does anyone know of any photo management projects that include integration with recent ML models for classification and captioning?

It'd be awesome to have a system that could automatically generate (model versioned) CLIP or SAM embeddings as metadata for your whole library, for downstream plugins to work with (for deduplication, facial recognition, semantic search, text search, etc.)

3 comments

PhotoPrism[0] does facial recognition and semantic search ("Your pictures are automatically classified based on their content and location. Many more image properties like colors, chroma, and quality can be searched as well.") Duplicate detection only finds exact matches.

[0] https://www.photoprism.app/features

I'm the author of PhotoStructure (which is still a web-based UI, like Immich), but has a strict requirement of not being the "system of record" for any metadata.

I built a prototype for a "curator" plugin API, but complexity kept me from releasing it. Here's the short list of "complexifiers":

1. Hourly/daily rate limits (say, from external API calls)

2. How long to cache results for a given input (say, when a new model is available)

3. Timeouts (to avoid sync processes getting "stuck" waiting for results)

4. How to effectively use those results (face grouping requires an ANN to handle the high-dimension vector embedding, whisper/OCR requires a fulltext index, ...)

It's 4. that's the real doozy at least in my mind. It might just be the case that there are only a handful of archetypes, but daily customer support and getting the Next Build out the door has put this task on the back burner for now.

(If anyone has seen this done nicely in other systems, I'm all ears!)

Immich uses CLIP for searching