Hacker News new | ask | show | jobs
by sgt101 3398 days ago
Yes but am I missing something when I say that if the problem is to deal with labelled 2d images declaring that you should be working with 3d images or short video sequences doesn't help.

Sure, if you are building a Robot and I say "use this camera and a deep network" and you say "It'll work better with stereo" well... yes super do that!

But if we are working with mono images I don't understand how the observation helps?

1 comments

Quoting from GP of your original reply:

> If you want to recognize all Coke cans in all fridges, for your real-world, consumer-ready Coke-fetching robot product?

If you're stuck with a mono dataset, post collection, then sure use NN and call it a day. But even if you have video you can do 3D reconstruction just from baseline movement. You won't know scale, so you can't differentiate between big coke cans and little coke cans, but at least you can rule out pictures of coke cans.