| > I'm just wondering if using cameras that are close to each other, but use different focal lengths, doesn't give the same results I can see why it might seem that way intuitively, but different focal lengths won't give any additional information about depth, just the potential for more detail. If no other parameters change, an increase in focal length is effectively the same as just cropping in from a wider FOV. Other things like depth of field will only change if e.g. the distance between the subject and camera are changed as well. The additional depth information provided by binocular vision comes from parallax [0]. > Also, wouldn't turning a multitude of views into a 3D map require a neural net anyway? Not necessarily, you can just use geometry [1]. Stereo vision algorithms have been around since the 80s or earlier [2]. That said, machine learning also works and is probably much faster. Either way the results should in theory be superior to monocular depth perception through ML, since additional information is being provided. > It seems to me that this is how modern phones are doing background removal: The lenses are very close to each other, very unlike the human eye. But they have different focal lengths, so depth can be estimated based on the diff between the images caused by the different focal lengths. Like I said, there isn't any difference when changing focal length other than 'zooming'. There's no further depth information to get, except for a tiny parallax difference I suppose. Emulation of background blur can certainly be done with just one camera through ML, and I assume this is the standard way of doing things although implementations probably vary. Some phones also use time-of-flight sensors, and Google uses a specialised kind of AF photosite to assist their single sensor -- again, taking advantage of parallax [3]. Unfortunately I don't think the Tesla sensors have any such PDAF pixels. This is also why portrait modes often get small things wrong, and don't blur certain objects (e.g. hair) properly. Obviously such mistakes are acceptable in a phone camera, less so in an autonomous car. > And those illusions work even though humans actually have an advantage over cheap fixed-focus cameras, in that focusing the lens on the object itself gives an indication of the object's distance If you're referring to differences in depth of field when comparing a near vs far focus plane, yeah that information certainly can be used to aid depth perception. Panasonic does this with their DFD (depth-from-defocus) system [4]. As you say though, not practical for Tesla cameras. [0] https://en.wikipedia.org/wiki/Binocular_disparity
[1] https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36...
[2] https://www.ri.cmu.edu/pub_files/pub3/lucas_bruce_d_1981_2/l...
[3] https://ai.googleblog.com/2017/10/portrait-mode-on-pixel-2-a...
[4] https://www.dpreview.com/articles/0171197083/coming-into-foc... |
This is also why some people will optimize each eye for different focal length when getting laser eye surgery. When your lens is too stiff from age, it won't provide any additional depth perception but will give you more detail at different distances.