|
|
|
|
|
by eightails
1603 days ago
|
|
Sure, but they're not getting that 3d map from binocular vision. The forward camera sensors are within a few mm of each other and different focal lengths. And the tweet thread you linked confirms it's a ML depth map: > Well, the cars actually have a depth perceiving net inside indeed. My speculation was that a binocular system might be less prone to error than the current net. |
|
I'm just wondering if using cameras that are close to each other, but use different focal lengths, doesn't give the same results.
It seems to me that this is how modern phones are doing background removal: The lenses are very close to each other, very unlike the human eye. But they have different focal lengths, so depth can be estimated based on the diff between the images caused by the different focal lengths.
Also, wouldn't turning a multitude of views into a 3D map require a neural net anyway?
Whether the images differ because of different focal lengths or because of different positions seems to be essentially the same training task. In both cases, the model needs to learn "This difference in those two images means this depth".
I think with the human eye, we do the same thing. That's why some optical illusions work that confuse your perception of which objects are in front and which are in the back.
And those illusions work even though humans actually have an advantage over cheap fixed-focus cameras, in that focusing the lens on the object itself gives an indication of the object's distance. Much like you could use a DSL as a measuring device by focusing on the object and then checking the distance markers on the lens' focus ring. Tesla doesn't have that advantage. They have to compare two "flat" images.