Hacker News new | ask | show | jobs
by yorwba 2791 days ago
It's not clear to me that you'd actually need to manually label individual images for training. The work of detecting and marking the censored regions is left to the user, so the model just needs to be good at inpainting. There are probably enough specialized sites to get a decently-sized training set with little effort beyond writing a crawler.
2 comments

I suggested that to deeppomf a while ago (I was thinking of simply using unlabeled anime images from https://gwern.net/Danbooru2017 with random area deletions to simplify the model & training process as much as possible) and his belief is that because genitals are such a small fraction of any images, and the rest of images vary so much while genitals are a fairly small narrow domain, a generic inpainting/denoising CNN will learn to inpaint pretty much anything else possible and neglect genitals specifically.

Presumably if you trained a really big inpainting CNN a lot, it would learn genitals (along with everything else), but it's understandable that he would try a much more targeted approach.

So do you know what exactly the model was trained on? Unless I missed it, there's no training code in the repo, or any other indication of how data was prepared.
I'm not sure. I suggested Danbooru2017, as I mentioned, and I thought he was using it, but double-checking his Reddit comments he seems to imply he's using a custom private dataset only at this point. Maybe he hand-extracted a lot of censored/original pairs from various places.
A neural net that replaces jarring censorship with suspiciously conveniently placed objects? Hilarious...
You could be right, I don't know. Maybe the authors can tell us about their experience.

My intuition is that the dataset will be too imbalanced to learn anything useful. Even if you crawl only decensored images, the area you truly care about inpainting is still pretty small. If you don't focus on it somehow, it might learn how to inpaint anime-style geometry correctly (from the rest of the image) but produce "barbie doll" style anatomy.

The demo in the repo only shows inpainting of anime-style geometry. In the 4chan thread linked by nathansmith, someone tried to apply it to a mosaicked penis, but it disappeared (NSFW, probably: https://i.4cdn.org/g/1540580868203.png). When they used the mosaic decensoring mode instead, it worked slightly better (NSFW, definitely: https://imgur.com/a/PwAunbc/). Other people complained about missing genitals as well.

So I'm guessing that this model doesn't have any specific training for filling in only genitals. I think there are category-aware inpainting algorithms that could be used, but they'd need tagged data, as you say.