Hacker News new | ask | show | jobs
by zawerf 2789 days ago
I believe "Image Inpainting for Irregular Holes Using Partial Convolutions" has been superseded by "Free-Form Image Inpainting with Gated Convolution", especially for interactive use cases?

https://arxiv.org/pdf/1806.03589.pdf

http://jiahuiyu.com/deepfill2/

I think the most interesting part of this project would be generating the dataset. That is a lot of manual redrawing of vaginas/penises. Or the reverse, where you find already decensored stuff and add realistic censorship yourself. Either way, a lot of time will be spent tagging genitals to build a large enough dataset to make NN techniques work...

4 comments

There's tens of thousands of manually decensored manga, you could download and compare them to original to identify censored regions and use as training I suppose.
Is it bad form to share one of these "datasets" here?
i'm not aware of an actual organized dataset, but you could start hitting up e-hentai.org and with the right searches (look for decensored or english, and then find the matching original japanese versions on there), you could get together hundreds or thousands of images in an evening.
Asking for a friend?
It's not clear to me that you'd actually need to manually label individual images for training. The work of detecting and marking the censored regions is left to the user, so the model just needs to be good at inpainting. There are probably enough specialized sites to get a decently-sized training set with little effort beyond writing a crawler.
I suggested that to deeppomf a while ago (I was thinking of simply using unlabeled anime images from https://gwern.net/Danbooru2017 with random area deletions to simplify the model & training process as much as possible) and his belief is that because genitals are such a small fraction of any images, and the rest of images vary so much while genitals are a fairly small narrow domain, a generic inpainting/denoising CNN will learn to inpaint pretty much anything else possible and neglect genitals specifically.

Presumably if you trained a really big inpainting CNN a lot, it would learn genitals (along with everything else), but it's understandable that he would try a much more targeted approach.

So do you know what exactly the model was trained on? Unless I missed it, there's no training code in the repo, or any other indication of how data was prepared.
I'm not sure. I suggested Danbooru2017, as I mentioned, and I thought he was using it, but double-checking his Reddit comments he seems to imply he's using a custom private dataset only at this point. Maybe he hand-extracted a lot of censored/original pairs from various places.
A neural net that replaces jarring censorship with suspiciously conveniently placed objects? Hilarious...
You could be right, I don't know. Maybe the authors can tell us about their experience.

My intuition is that the dataset will be too imbalanced to learn anything useful. Even if you crawl only decensored images, the area you truly care about inpainting is still pretty small. If you don't focus on it somehow, it might learn how to inpaint anime-style geometry correctly (from the rest of the image) but produce "barbie doll" style anatomy.

The demo in the repo only shows inpainting of anime-style geometry. In the 4chan thread linked by nathansmith, someone tried to apply it to a mosaicked penis, but it disappeared (NSFW, probably: https://i.4cdn.org/g/1540580868203.png). When they used the mosaic decensoring mode instead, it worked slightly better (NSFW, definitely: https://imgur.com/a/PwAunbc/). Other people complained about missing genitals as well.

So I'm guessing that this model doesn't have any specific training for filling in only genitals. I think there are category-aware inpainting algorithms that could be used, but they'd need tagged data, as you say.

If there’s too much manual work, you aren’t using enough layers! ;) Seems the smart approach for a... “focused” AI would be to manually label a bunch of nsfw bits and use that to build a censor bot that auto-detects and censors the of-interest regions. From there you can generate a large corpus of censored-uncensored pairs with which to train the decensor bot.
Hentai is censored only for the Japanese marked, because it is a legal obligation. When it is exported, the uncensored version is normally used.

So you just need to find both versions in order to make your dataset. Add the large amount of decensored artwork and it is not that hard to build a huge dataset. It may even be automatable by parsing imageboards.

Are you sure about this? How much hentai is actually exported? I get the impression that it's very little, and very little of the Japanese releases, due to the fact that most hentai sites have English translations provided by fan groups, and they're scanned translations of what's sold in Japan.
I think some hentai doujins are licensed by Fakku, and they seem to be uncensored. I've never used Fakku though. Note that their website is NSFW.
Fakku apparently gets access to the masters: https://www.animenewsnetwork.com/answerman/2018-11-02/.13900...
If only because censored only hentai is popular that someone came up with this idea. I'm not a hentai expert by any means but I'd imagine a lot of hentai is never exported and so no uncensored version is ever published.