| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by snovv_crash 3399 days ago

Object recognition doesn't only exist in the subspace of labelled 2D images. It tends to be derived from a 3D space, which is a whole extra orthogonal data source that the "NN all the things" crowd is fastidiously ignoring.

Why, I'm not sure, but I'm guessing because it is hard/inaccurate to do with just NNs and parameter/network architecture tweaking. Possibly also because benchmarks with single mono images are much easier to make.

Just because it is hard with method A, and is harder to make benchmarks, doesn't mean method B isn't better.

1 comments

sgt101 3399 days ago

Yes but am I missing something when I say that if the problem is to deal with labelled 2d images declaring that you should be working with 3d images or short video sequences doesn't help.

Sure, if you are building a Robot and I say "use this camera and a deep network" and you say "It'll work better with stereo" well... yes super do that!

But if we are working with mono images I don't understand how the observation helps?

link

snovv_crash 3398 days ago

Quoting from GP of your original reply:

> If you want to recognize all Coke cans in all fridges, for your real-world, consumer-ready Coke-fetching robot product?

If you're stuck with a mono dataset, post collection, then sure use NN and call it a day. But even if you have video you can do 3D reconstruction just from baseline movement. You won't know scale, so you can't differentiate between big coke cans and little coke cans, but at least you can rule out pictures of coke cans.

link