Hacker News new | ask | show | jobs
by GuB-42 2791 days ago
Hentai is censored only for the Japanese marked, because it is a legal obligation. When it is exported, the uncensored version is normally used.

So you just need to find both versions in order to make your dataset. Add the large amount of decensored artwork and it is not that hard to build a huge dataset. It may even be automatable by parsing imageboards.

2 comments

Are you sure about this? How much hentai is actually exported? I get the impression that it's very little, and very little of the Japanese releases, due to the fact that most hentai sites have English translations provided by fan groups, and they're scanned translations of what's sold in Japan.
I think some hentai doujins are licensed by Fakku, and they seem to be uncensored. I've never used Fakku though. Note that their website is NSFW.
Fakku apparently gets access to the masters: https://www.animenewsnetwork.com/answerman/2018-11-02/.13900...
If only because censored only hentai is popular that someone came up with this idea. I'm not a hentai expert by any means but I'd imagine a lot of hentai is never exported and so no uncensored version is ever published.