Hacker News new | ask | show | jobs
by SoSoRoCoCo 1988 days ago
This article is dead-on, but I think it is missing a fairly large segment of where ML is actually working well: anomaly detection and industrial defect detection.

While I agree that everyone was shocked, myself included, when we saw how well SSD and YOLO worked, the last mile problem is stagnating. What I mean is: 7 years ago I wrote an image pipeline for a company using traditional AI methods. It was extremely challenging. When we saw SSDMobileNet do the same job 10x faster with a fraction of the code, our jaws dropped. Which is why the dev ship turned on a dime: there's something big in there.

The industry is stagnated for exactly the reasons brought up: we don't know how to squeeze out the last mile problem because NNs are EFFING HARD and research is very math heavy: e.g., it cannot be hacked by a Zuck-type into a half-assed product overnight, it needs to be carefully researched for years. This makes programmers sad, because by nature we love to brute force trial-and error our code, and homey don't play that game with machine learning.

However, places where it isn't stagnating are things like vibration and anomaly detection. This is a case where https://github.com/YumaKoizumi/ToyADMOS-dataset really shines because it adds something that didn't exist before, and it doesn't have to be 100% perfect: anything is better than nothing.

At Embedded World last year I saw tons of FPGA solutions for rejecting parts on assembly lines. Since every object appears nearly in canonical form (good lighting, centered, homogeneous presentation), NN's are kicking ass bigtime in that space.

It is important to remember Self-Driving Car Magic is just the consumer-facing hype machine. ML/NNs are working spectacularly well in some domains.

4 comments

I worked and built out a proof of concept industrial defect detection system recently, with a large focus on modern DNN architectures. We worked with a plant to curate a 30000+ multi-class defect dataset, many with varying lighting and environment conditions. As you said, modifying and parameter tuning NN is not always a hopeful endeavor.

However, you can make significant gains to your models by going back to traditional image filtering/augmentation. Sticking with well researched object detectors/segmentation algorithms and putting our effort on improving the algorithms that cleans up the data takes you far. It's impossible to avoid because images will always be full of reflections, artifacts, strange coloration unless you have the perfect lighting tunnel setup; doable nonetheless.

Currently doing the image collection for a NN. Created a custom HW rig to speed things up, lighting, turntables, actuators for novel objects, the works. It's really hard and tedious. We're doing liquid detection and even under IR/UV lights, it's still really hard.

We'd love to be able to work with a company for a few days, get the parameters set up right for our case, and then let them take the thousands of images. My company would easily pay $100K+ for such a data set.

> This makes programmers sad, because by nature we love to brute force trial-and error our code, and homey don't play that game with machine learning.

Huh? If anything I would say ML is way more trial-and-error focused than imperative programming.

Well, if your only experience is reading python opencv stack overflow posts, then of course...
I mean in the sense that using ML for a problem often requires just trying a dozen different modeling techniques, then a bunch of a hyper-parameter searching, then a bunch of stochastic tuning…
Oh. I see what you mean. Yeah, I guess by definition backwards propagation is trial-end-error. Huh, I never thought of it that way. Thanks for clarifying, I thought you were being saucy: my apologies for being snarky.
It's HN; we're all here for the snark!
> However, places where it isn't stagnating are things like vibration and anomaly detection. This is a case where https://github.com/YumaKoizumi/ToyADMOS-dataset really shines because it adds something that didn't exist before, and it doesn't have to be 100% perfect: anything is better than nothing.

This is a link to a dataset, unless I'm missing something it's not about anomaly detection. I looked into this area a few years ago and always try to keep my eye open for breakthroughs... care to share any other links?

Oops! Thanks, it was from this challenge. Lots of neat stuff in here.

http://dcase.community/challenge2021/index

>The industry is stagnated for exactly the reasons brought up: we don't know how to squeeze out the last mile problem because NNs are EFFING HARD and research is very math heavy: e.g., it cannot be hacked by a Zuck-type into a half-assed product overnight, it needs to be carefully researched for years. This makes programmers sad, because by nature we love to brute force trial-and error our code, and homey don't play that game with machine learning.

Uh what? You can literally finetune a Fast.ai model overnight to be borderline SOTA on whatever problem you have data for. 0 Math involved, isn't that exactly a hacker's wet dream?

Fast.ai works pretty well when you're working on standard tasks but starts to fall apart when you want to do something more exotic.
I doubt there's much in CV for instance that couldn't be achieved easily with fast.ai. You don't need to be doing exotic things to build a product.
Never said you need an exotic model to build a product nor that you couldn't do exotic things in fast.ai. Fast.ai is just a leaky abstraction.
And yet, it might fail in actual application.

If every DGP could be captured by fast-tuning a sophisticated enough model, science probably would be solved even before DL.

We're talking about building useable products not solving science...
My point being that the reason many products end up not usable, ref. accounts in this thread, is the same reason why science isn’t solved and doing ML correctly isn’t easy.