Hacker News new | ask | show | jobs
by demosthenes111 3101 days ago
I suspect part of the problem with this approach relates to data set construction. Even though models these days should theoretically be able to handle this task, there are clear ethical concerns and practical issues with making datasets of illegal imagery large enough for training. It really raises the question - is creating a dataset like that ethical, assuming the intentions are to stop further abuse and dissemination?

There might be work arounds (like training one model for nudity and another for age) but such approaches are almost certain to have "suboptimal" performance as compared to a single model trained on a relevant dataset. Maybe something like that is the cause for the performance issues discussed in the article.

2 comments

Honestly, if they stored evidence collected from previous cases, the data probably already exists in some form. As long as the data was closely guarded internally, I'm not sure if there would be an ethics problem using this for training.

The biggest problem with obscenity detection, though, is getting the context right. The AI might be able to get to the point where it can detect "naked human" at a good percentage level. At the moment, however, I doubt it could detect whether the naked human was considered obscene in current culture, eg: the difference between "child pornography" and "famous Vietnam War photo" as alluded to in the Gizmodo article. So no matter how good their model gets, without further refinements in AI it would only be good for a "first pass", I would think.

Ethical is a tricky issue here. Not everyone agrees that mere possession of such pictures is unethical.

This still leaves the obvious legal problems of having a database of illegal pictures, but we already let law enforcement do otherwise illegal things (including use of illegal pictures in some contexts)