|
|
|
|
|
by dnth
1206 days ago
|
|
The authors at fastdup ran an analysis on LAION 400M and Imagenet21K. Here's what they found. LAION 400M > 60M duplicates.
> 962K broken images.
> Various label discrepancies. ImageNet21K > 1.2M duplicate images.
> 104K train/val leak. fastdup GitHub repo - https://github.com/visual-layer/fastdup |
|