Hacker News new | ask | show | jobs
by lostNFound 916 days ago
A question I haven't seen asked here (and maybe I am ignorant) but, dooesn't this kind of content exist exclusively in the dark/deep web? I thought CSAM in the clearweb was finally 1000% eliminated.
4 comments

No, there definitely is CSAM in the clearweb, it's just that it's usually ephemeral as it gets relatively rapidly removed as fast as it's added, and having a process to fight CSAM is (rather expensive) table stakes for anyone who wants to make a public service that includes user-generated content, because you will get "this kind of content" posted on your platform.

E.g. one of the issues with changes in Twitter moderation after the management change was that it turned out that reducing the moderation manpower meant that suddenly CSAM on Twitter was more prevalent.

The same applies to other media - e.g. observing Mastodon https://www.theverge.com/2023/7/24/23806093/mastodon-csam-st... found a pretty similar "CSAM rate" as in this Laion dataset.

Hm I guess that makes sense, if there is a CSAM "fill rate" in the clearweb of X images per unit time, and also assuming a "remove rate" that is approximately X then there will always be a "rolling buffer" of around X images constantly (which would change in its content every unit time, but always be there in terms of quantity) and that's probably what the algorithm picked up.
How do you know if pictures of genitals somebody uploaded are 17 year old or 18 year old genitals where there is no visible difference between legal and illegal content?

What about cases where one man’s cute family photos in the bathtub are another man’s pornography? There’s a bunch of reasons why that’s impossible.

According to section 4 of the paper, over 700 of these were matches in the PhotoDNA database. I don't know if these have been perfectly scrubbed, but none of the popular image hosts would carry these.
I don't think that's possible.