| Hi HN, I built a duplicate file finder for my brother who's a photography enthusiast and constantly runs out of storage constantly due to resizing a lot of photos and having a lot of duplicates around. Notes:
- Incremental hashing: Instead of loading entire files into memory, I hash files in chunks. Files with identical sizes get grouped and progressively hashed until they diverge or
match completely.
- Perceptual hashing: For images, I use perceptual hashing (pHash) that generates a fingerprint based on visual content rather than bytes. Similar images have similar hashes.
- BK-Tree indexing: To efficiently search for similar hashes, I implemented a BK-tree that organizes hashes by Hamming distance. This lets me query "find all images within distance
N" without comparing against every single hash.
- Configurable similarity: Users can adjust the Hamming distance threshold (1-15) to control how strict the matching should be.
- Added macOS Services integration so you can right-click any folder in Finder and select "Scan for Duplicates" The app has a free trial (10 scans / 7 days, whichever is earlier) and then requires a license. I'm using Dodo Payments for licensing. I'd love feedback from the community, especially on:
- Performance optimizations I might have missed
- Better UX patterns for the results view
- Edge cases in the similarity detection
- More feature suggestions REQUIREMENTS: macOS 26.0.1 (Tahoe) and Apple Silicon Macs Happy to answer questions about the implementation or architecture! |