The point here really is that fundamentally you have a simultaneous localization and mapping (SLAM) problem.
It's a problem that benefits from incorporation of data from multiple different sources that do not have correlated errors. The way the authors focus purely on the computer vision aspects is idiotic; I know nothing about how Tesla approaches this but I almost guarantee you that they also consume map data, vehicle kinematics and so on when updating their model.
There is no world in which adding data that allows you to discriminate between (to pick a not-entirely-random example) the white roof of an overturned semi that represents an obstacle and a bright patch of sky doesn't help enormously. The suggestion that LIDAR is only relevant to pre-mapped areas is ... bizarre and nonsensical.
The Tesla bet is just that "good enough" can be achieved with fewer sensor sources. That's it.
It's a problem that benefits from incorporation of data from multiple different sources that do not have correlated errors. The way the authors focus purely on the computer vision aspects is idiotic; I know nothing about how Tesla approaches this but I almost guarantee you that they also consume map data, vehicle kinematics and so on when updating their model.
There is no world in which adding data that allows you to discriminate between (to pick a not-entirely-random example) the white roof of an overturned semi that represents an obstacle and a bright patch of sky doesn't help enormously. The suggestion that LIDAR is only relevant to pre-mapped areas is ... bizarre and nonsensical.
The Tesla bet is just that "good enough" can be achieved with fewer sensor sources. That's it.