| The biggest difference between our two projects, is the effort you've put into training your own model, which I think is amazing. One of the technical issues that you pointed out, is that a model trained on still images, shouldn't be expected to work on video. While I did not train a custom model for this this project, I'm current working on another DNN model for a completely different purpose, where I think feeding frame deltas into the model, will improve the outcome. As a hobbyist, I would reckon for porn and the like, analysing frozen frames is probably just enough. For violence however, I would agree with you and say that some effort to encode motion would be essential. Focussing on NSFW content generally, I would guess, depending on the scale of your project, that you will forever run into 'edge cases' for NSFW images, even before you run into the soft wall of subjectivity. I agree that the tech is improving all the time, and I think something like this can be made to be truly useful one day. Possibly soon. But it would need a large, active development team, a great deal more compute, and a LOT of data. In much the same way that no home/garage coder can hope to put together a model like GPT3 right now, I would think that a foolproof NSFW classifier would need more resources than you or I have access to at this moment. But things change all the time. Thinking about what you're doing, one thing I might suggest, if you have time to develop it, is to add some kind of 'recording' mechanism to your plugin, so that the users themselves can add to your dataset... But you have to wonder how many users will allow that! XD I'm also wondering if a Firefox extension is the best place for your model? To that end, I would suggest putting the app on a server (which is what I originally wanted to do with my hack) which will give you the opportunity to crowdsource data collection. People might be more willing to volunteer data in that way (in a similar way to how people use https://builtwith.com/). You're also very welcome to take the UX work I've done on this opensource project (because this hack was ultimately just a UX experiment), and plug your model in. If your model and trained weights are available, I'd like to try and create a branch myself, if I have time. Also, as hobbyist building knowledge, I hadn't heard of `EfficientNet Lite` before.
I'd been considering Darknet - https://pjreddie.com/darknet/ for embedded stuff until reading your post. |
Regarding the current state of tech: I agree the tech still has quite a ways to go. I think one of the most interesting aspects here is how e.g. NSFW.js can get extremely high accuracy - but not necessarily perform better in the real world. I think it speaks in part to the nature of how CNN's work, the nature of the data, and the difficulty of the problem. Still, having seen how incredibly good "AI" has gotten in the last decade, I have quite a bit of hope here.
Regarding putting it on a server: that is indeed a fair question, but my desire is to keep the scanning on the client side for the user. In fact, it was actually the confluence of Firefox's webRequest response filtering (which is why I didn't make a Chrome version) and Tensorflow.js that allowed me to move from dream to reality as I had been waiting prior to that time. I can't afford server infrastructure if the user base grows, and people don't want to route all their pictures to me. So I guess I see the current way it works as a bonus, not a flaw - but it DOES impact performance, certainly.
Regarding data collection with respect to server - yes, this is something I've contemplated (there's a GitHub issue if you're curious). There are, however, two things that I've long mulled over: privacy and dark psychological patterns. Let me explain a bit. On the privacy front - it is not likely legal for a user to share the image data directly due to copyright, so they need to share by URL. This can have many issues when considering e.g. authenticated services, but one big one also is that the URL may have relatively sensitive user-identifying information buried in its path. I can try to be careful here but this absolutely precludes sharing this type of URL data as an open dataset. On the psychological dark patterns front - while I'm fine with folks wanting to submit false positives, I think there's a very real chance some will want to go flag all the images they can find that are false negatives (e.g. porn). I don't think that type of submission is particularly good for their mental health or mine. So, in general, I think user image feedback is something that would be quite powerful but needs a lot of care in how it would be approached.
Regarding the UX - thanks! And you're welcome to try the model as well - I've tried to include enough detail and data to allow others to integrate as they wish: https://github.com/wingman-jr-addon/model/tree/master/sqrxr_... Also, let us know how things go if you try out Darknet.
Good luck!