Hacker News new | ask | show | jobs
by ipsum2 1203 days ago
How did you deal with data contamination?
1 comments

The datasets we used are pretty clean themselves if we compare them with LAION. But we also filtered out images with captions on them and by CLIP's scores. Btw, huge thanks for Laion and Open_clip projects! It inspires us a lot.