Hacker News new | ask | show | jobs
Visualize your dataset using DINOv2 embedding
1 points by dnth 1145 days ago
Visualizing your dataset (especially large ones) in a low-dimensional embedding space can tell you a lot about the patterns and clusters in your dataset.

We recently release a notebook showing how you can visualize your dataset using DINOv2 models by running it on your CPU.

Yes! No GPUs needed.

We used it to find clusters of similar images, duplicates, and outliers in a subset of the LAION dataset

Try it on your own dataset:

Colab notebook - https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/dinov2_notebook.ipynb

GitHub repo - https://github.com/visual-layer/fastdup