Hacker News new | ask | show | jobs
Find image duplicates and outliers – A free, scalable, efficient tool (github.com)
2 points by dnth 1183 days ago
1 comments

fastdup is a tool that let you gain insights from a large image/video collection.

It lets you identify image duplicates, video duplicates, wrong labels, outliers, corrupted data, and image clusters.

fastdup is -

Unsupervised: fits any visual dataset. Scalable: handles 400M images on a single machine. Efficient: works on CPU (even on Google Colab with only 2 CPU cores!). Low Cost: can process 12M images on a $1 cloud machine budget.