I wonder if they're using Lidar on their test vehicles in order to improve stereoscopic image analysis to the point where they can use cameras instead.
As far as I know, stereo vision has some problems with uniform surfaces (snow-covered road). Since any two points on a uniform surface will have similar descriptors, it's pretty hard to match them.
There are some solutions to this problem (using a Markov Random field or some assumptions about the surface), but I'm not sure how reliable they are.
we have pretty high retina resolution. To feel the limits of our stereo machinery, look at any monotone surface with monotone lightning at the distance that you can' distinguish any local mini-features of the surface - looking at the smooth ceiling of something like this - and you can feel how your eyes strain trying to find (and can't) the correct focus distance to distinguish the features and thus to get stereo working by matching the features from the left and right channels. The same feeling can be felt when you look at say vertical bar patterned surface so that eyes/brain have harder time to match specific bars from left and right eye. In both cases the surface should be large enough to cover the focus spot in your vision space, so that eyes/brain couldn't get help from relative position of the surface vs. other objects.
Those are actually the same issues one encounters when develops computer stereo :) The modern cameras got to the retina resolution and computer power today is able to do stereo match on those resolutions - this is why we started to get meaningful results. The computer stereo will surpass human's because cameras can have higher resolution, higher sensitivity and they can see in different wavelengths in addition to visual. Plus you can "paint" the objects ahead of you with IR for example, so the stereo match would be even easier.
In addition to trhway's answer, I'd say we use a lot of assumptions about the world : houses usually have flat, orthogonal walls, streets are flat, etc.. Of course, we can have the computer learn those assumption, but it's more complicated than pure dense stereo.
There are some solutions to this problem (using a Markov Random field or some assumptions about the surface), but I'm not sure how reliable they are.