fastdup is a tool that let you gain insights from a large image/video collection.
It lets you identify image duplicates, video duplicates, wrong labels, outliers, corrupted data, and image clusters.
fastdup is -
Unsupervised: fits any visual dataset.
Scalable: handles 400M images on a single machine.
Efficient: works on CPU (even on Google Colab with only 2 CPU cores!).
Low Cost: can process 12M images on a $1 cloud machine budget.
It lets you identify image duplicates, video duplicates, wrong labels, outliers, corrupted data, and image clusters.
fastdup is -
Unsupervised: fits any visual dataset. Scalable: handles 400M images on a single machine. Efficient: works on CPU (even on Google Colab with only 2 CPU cores!). Low Cost: can process 12M images on a $1 cloud machine budget.