The hashes are 16 character hexadecimals represented as strings. Had a quick look at the faiss package and it looks promising. Would consider it for the next versions.
If you're interested in collaboration I'd be happy to help with a prod-focused version. My work has a need for a shardable daemon for dedup tasks. My personal email is in my description and I'm also available via josh@xix.ai.
We also have an image heavy production use case that would be able to yield some nice metrics from this tool.
We also have an image heavy production use case that would be able to yield some nice metrics from this tool.