Hacker News new | ask | show | jobs
by orionblastar 3993 days ago
Until there is a Machine Learning algorithm that can detect CP, you'll have to have human beings flag it and then other human beings view it and remove it.

Someone brought it to my attention that Bing's cache is full of CP, after the offending websites are taken down, Bing keeps the images for a long time. The Rapidshare sites are also full of it and they password protect RAR files so admins cannot peak into it. It is a major problem that has no solution for it yet. People run Wordpress blogs and spambots leave comments that link to CP sites.

This has become a hot topic issue because that Jared guy from Subway had a manager of his foundation that was found with CP, and they raided Jared's computers and found more evidence.

My ethics and morals won't allow me to look at porn, but it is a big industry. There are all kinds of porn out there. The CP is the worst of it, and a lot of children are trafficked as sex slaves for it. They grow up with a criminal record and sex offender record, and by the time they expunge the record they are in their 40s and can't find work. I was contacted by a woman who was in that situation on Github during the Opal CoC debates. She is trying to get out of her situation by programming and cannot find work because of it.

This CP stuff ruins the lives of the children who suffer abuses for it. Once they grow up they have a hard time in life trying to make ends meet. Some have serious psychological problems that are hard to treat and deal with.

I remember that in some cases the website is found responsible for the content that users post on their websites. Laws in your nation may vary on that. If you find illegal content you should remove it, least you be found liable for it. Make sure to report the IP address of the poster to the government or a non government agency that handles it.

2 comments

Are there any good ML algorithms for detecting porn at all? I tried to implement the standard "pink detector" with mixed results.
No. I looked into this a lot for a dating app I ran and no algorithms came close to human moderation, even for images you'd consider are obviously pornographic, which can get expensive.

A funny idea I had was to reverse the whole system - feed UGC content of a site that's supposed to be SFW into a porn site which is definitely NSFW, one that has lots of thumbnails. The ones that don't get any clicks to enlarge probably aren't porn and can pass the test :)

That's brilliant!
Until you go searching for porn and every other picture is corn on the cob or doorknobs.
ML is very difficult for porn. You end up with so many false positives it almost becomes useless compared to human moderation. "Porn" is also a very context sensitive term. What is porn vs. nude art? Is it still porn if the people in it aren't "pink"? What about cultural differences? What is considered inappropriate in the US may not be inappropriate in Europe, etc. How much skin has to be shown for it to be porn? There are so many questions. I'm sure detecting porn accurately will be one of the hardest problems we'll overcome in computer vision, as outrageous as that sounds - because of the level of context that is required.

See also: SFW porn - where other images are super imposed on top of real porn. It's hilarious, but is it really SFW? You decide. http://www.reddit.com/r/sfwporn

ML is very limited. It tagged African-American people as Gorillas for example. White people it tagged as dogs or goats. The African-american people were more outraged because being called a gorilla is racist. Even if the ML doesn't know what race is. Facial recognition is also hard, the ML confuses some people for others who have similar faces but are different people. ML just isn't at the level where it is reliable enough to do what we ask of it yet.
You'd think HN would spot a market opportunity like that and exploit it. Good programmers with unfair criminal records at below market rates?