Hacker News new | ask | show | jobs
by cbcase 5605 days ago
I cannot understand why it is that so many obviously very intelligent people decide that we need another computer vision-based startup. Because the unfortunate truth is that computer vision (right now) doesn't work.

Let me qualify that. From the academic / research point of view, there have been a collection of real successes in computer vision in, say, the last ten years. But my sense is that what counts as a research success is a long way from what counts as a practical business success.

For example, the best generic object detector at the moment is probably Felzenszwalb's using deformable parts-based models[1]. And it's just not that good. On the latest PASCAL object detection challenge, you'll see that its mean precision is only ~30%.

Scott Brown, the interviewee, sets Vicarious apart by highlighting the fact that their system will be neurobiologically inspired. But the idea of learning hierarchical systems that mimic the brain's visual processing system is hardly new, and the jury is still out on whether these systems can do better than the "hand-coded" systems like Felzenszwalb's. As a random example, see [2].

Like.com showed you can build a business that uses computer vision in some way. But as Brown snarks, they "use a big bag of different heuristics to figure out the image." For the time being, that seems to be the only way to get computer vision to work in practice.

That all said, I wish them luck.

[1] http://people.cs.uchicago.edu/~pff/latent/

[2] http://www.cs.stanford.edu/people/ang//papers/nips07-sparsed...

3 comments

Your argument against computer vision startups is that there isn't a viable computer vision solution at this point?
Well, his argument is that well-funded, very intelligent people are trying like hell at computer vision, and not succeeding. That's not a good sign - you'd prefer that your space has been hitherto overlooked by smart people with lots of money.
I think he's arguing that computer vision is a research subject - most startups are doing known things (in the sense of "this has been successfully done before") or at most development ("this has been successfully done before - in the lab").
And yet facial-recognition is now freely available to consumers (Picasa, Facebook etc), our phones have blink detection, 3D motion detection and tracking is available to consumers for ~$100 (Kinect).

I'm not familiar with the PASCAL object detection challenge, but I just had a quick look. It's hard - if I understand it correctly, classifiers had to categorize photos into containing 5 types of objects form the 1000 leaf nodes of http://www.image-net.org/challenges/LSVRC/2010/browse-synset.... (Based on the description from http://www.image-net.org/challenges/LSVRC/2010/pascal_ilsvrc...). I'm having trouble understanding the scoring scheme (how is flat cost calculated?), but based on this I'm quite impressed.

I'm human (yes, I swear it's true), and I couldn't classify things like different breeds of poodle: http://www.image-net.org/synset?wnid=n02113712

There are many different actual tasks that technically are PASCAL challenges, but when people say "PASCAL VOC challenge" (Visual Object Classes), they typically mean either the _classification_ or _detection_ challenge:

Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image.

Detection: Predicting the bounding box and label of each object from the twenty target classes in the test image.

Neither uses the full ImageNet data set. Instead, it's images from 20 classes of object, like shown here: http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/exam...

Here are the results: http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/resu...

I find the results quite impressive - especially for classification - 90%+ precision for detecting people in photos seems like a good result.

> I cannot understand why it is that so many obviously very intelligent people decide that we need another computer vision-based startup. Because the unfortunate truth is that computer vision (right now) doesn't work.

This seems like a really good reason to create another computer vision-based startup.

No, it seems like a very good reason to take useful/promising but improperly commercialized research and turn it into a product. A startup rarely has enough runway to do the scientific research needed to solve a problem like this.
It is certainly rare, but Numenta has been doing it for the past 6 years, and for several years before that at the Redwood Neuroscience Institute from which it spun off. In doing so, Numenta undoubtedly stands on the foundation of significant progress in academia, but still has to do a fair bit of what one might call "research engineering."
Basic R&D is a cost that successful businesses--especially small ones--tend to externalize. Microsoft does a bit, Bell Labs used to do more; but you just don't start a FTL spaceship company before basic research has established a coherent theory of FTL travel.