Hacker News new | ask | show | jobs
by arjo129 3390 days ago
They work well, just that you need a lot of patience (and know how) to work with them. Also GPUs are expensive. By the time you realize that you messed up you have wasted a lot of time. Of course this is true with any ml algorithm out there. But what I'm trying to say is it is possible that an as yet unknown method exists that may be less computationally complex.

One of the problems I see is that people abuse deep neural networks no end. One doesn't need to train a deep nn for recognizing structured objects like a coke can in a fridge. Simple hog/sift/other feature engineering may be a faster and better bet for small-scale object recognition. However expecting sift to out perform a deep neural net on imagenet is out of question. Thus when it comes to deploying systems in a short frame of time one should keep an open mind.

1 comments

> One doesn't need to train a deep nn for recognizing structured objects like a coke can in a fridge.

I disagree. Sure, you don't need a NN to recognize one Coke can in one fridge for your toy robot project. If you want to recognize all Coke cans in all fridges, for your real-world, consumer-ready Coke-fetching robot product? You're going to need a huge dataset of all the various designs of Coke cans out there, in all the different kinds of refrigerators, and your toy feature engineered approach is going to lose to a NN on that kind of varied dataset.

Which is why you should do stereo or SfM, make a 3d reconstruction, and then do HOG or some 3D feature to recognise the coke can.

Trying to do it from images with a NN that doesn't comprehend 3D space is just silly.

I'm not sure if you're serious or throwing some very excellent shade.
I'm serious. If you have to rely on mono, single image inputs then yeah ImageNet is going to do better. But it will also mistake every picture of a coke can as the real thing. It will be horrifically sensitive to malicious inputs. Much better would be to use 2 calibrated lenses and do 3D reconstruction. Even if you're just doing the reconstruction as a sanity check for a NN to weed out the false positives.
Errm, hang on, are you saying that if you have a task of classifying unseen images given a labelled training set you should get a stereo camera or video camera and create another problem?

Which you can solve?

Because the problem is silly>

What if I say : "I will give you $10m to solve it, and if you fail, I will kill this very kind old monkey?"

Object recognition doesn't only exist in the subspace of labelled 2D images. It tends to be derived from a 3D space, which is a whole extra orthogonal data source that the "NN all the things" crowd is fastidiously ignoring.

Why, I'm not sure, but I'm guessing because it is hard/inaccurate to do with just NNs and parameter/network architecture tweaking. Possibly also because benchmarks with single mono images are much easier to make.

Just because it is hard with method A, and is harder to make benchmarks, doesn't mean method B isn't better.

Yes but am I missing something when I say that if the problem is to deal with labelled 2d images declaring that you should be working with 3d images or short video sequences doesn't help.

Sure, if you are building a Robot and I say "use this camera and a deep network" and you say "It'll work better with stereo" well... yes super do that!

But if we are working with mono images I don't understand how the observation helps?

Quoting from GP of your original reply:

> If you want to recognize all Coke cans in all fridges, for your real-world, consumer-ready Coke-fetching robot product?

If you're stuck with a mono dataset, post collection, then sure use NN and call it a day. But even if you have video you can do 3D reconstruction just from baseline movement. You won't know scale, so you can't differentiate between big coke cans and little coke cans, but at least you can rule out pictures of coke cans.

But an NN can complete mess up when a new refrigerator is used, that wasn't part of the training set.

Also, the training is very asymmetric, since there are many more things NOT coke cans than there are coke cans.

> But an NN can complete mess up when a new refrigerator is used, that wasn't part of the training set

Not if your training set is representative. And this is just as true of feature engineered approaches, the only difference is that dealing with real world variation requires a lot less work with NNs because once you add the variation to your dataset you're done. With feature engineering that's only the first step because now you have to figure out where the new variation is breaking your features and how to modify them to fix it.

"Not if your training set is representative."

And herein lies a prominent failure mode of a huge amount of this sort of work that I've seen - hard to just "add the variation to your dataset" when your data set is one or more orders of magnitude too small to contain it. At that point all that remains is the handwaving.

The right response to insufficient data is usually simplifying the modeling.