Hacker News new | ask | show | jobs
by Footnote7341 910 days ago
Such a vanishingly small percentage of the images its not even worth calculating. Of course search engines also contain these links, LAION-5B only contains links not images as well....

protect the kids!!! or something. Can we do a scandal about how many 'extremist' images are in the data-set next too? anti-vaxer, climate denier, nazi, religious extremist propaganda, scientific misinformation. Maybe we're all safer off using corporate models with closed data sets so no one gets any of the wrong ideas.

3 comments

For the children who have been victimised, I doubt the small percentage provides any comfort.
The victims have already been victimized many years ago. Finding this content in a training set doesn't re-victimize them.
Yes, and it's better we spend the effort on preventing the physical abuse from happening in the first place.
Wtf, if you can get rid of the images why wouldn't you? Clearly it's not an exorbitant amount of work as shown in the paper, they use very conventional techniques.
CSAM does not fall under “freedom of expression”, everything else you list does. It’s not a slippery slope.

Resist anyone’s attempt to make everything else you list illegal to express or possess.