Hacker News new | ask | show | jobs
by kvuj 41 days ago
My god, from this video I learned two things:

- Tesla's vision only approach seems a lot more competent than the Lidar suites from smaller Chinese makers. Perhaps I misjudged how necessary Lidar was to achieve safe driving.

- Virtually all of the Chinese car infotainment were basically a 1:1 copy of Tesla's. I couldn't find any that genuinely tried something unique lol

4 comments

> - Tesla's vision only approach seems a lot more competent than the Lidar suites from smaller Chinese makers. Perhaps I misjudged how necessary Lidar was to achieve safe driving.

Three things can be simultaneously true:

* Tesla's cameras are sufficient for some scenarios.

* Tesla's cameras are insufficient for other scenarios.

* A system with good data and bad algorithmic processing is still going to be bad. The Chinese vehicles almost always fail the tests because they see the obstacle but drive into it anyway.

Yeah it's interesting hearing their engineering logic, that fewer sensor types means less sensor collision and faster iteration, where iteration speed is really what matters. I also think people overhyped lidar because they don't understand it, and human behavior is to associate things we don't understand to magic. It's not magic, it performs poorly in inclination weather and can have issues with resolution over range and data processing (although lidar does do a lot of things well).

All of this said, once Karpathy left they have slowly looked at adding new sensors (recently radar), so who knows what the future for Tesla's sensor suite holds.

> I also think people overhyped lidar because they don't understand it

Speaking as a person who understands it extremely well and who has an advanced degree in computer vision, I'm sure that internet randos did, but I promise the people who actually know about the failure modes of the different modalities did not. I don't really expect you to take my word for it, but maybe this will spark an interest in investigating the failure scenarios of 3D reconstruction using cameras in computer vision. Just know that Google is an absolute top tier juggernaut in the CV/ML/AI research world, and they don't use lidar out of ignorance.

> less sensor collision

This isn't a real thing for anyone doing a good job. A sensor can be good for a scenario or it can be bad for a scenario. More sensors feeding input only gives you gradations of accuracy instead of binary accuracy. Having gradations of accuracy is an unambiguously good thing. When you only have one sensor, you have no way to know whether in the moment it is feeding you an optical illusion. That's what it means for something to be an optical illusion. But when you have multiple sensors of different modalities, then you have meaningful information about whether local disagreement between the different modalities means that one is better or worse than the other, because you can contextually characterize the failure scenarios of each.

> It's not magic, it performs poorly in inclination weather and can have issues with resolution over range and data processing (although lidar does do a lot of things well).

Inclement, not inclination. And I hate to be the bearer of bad news, but cameras also do poorly in inclement weather and have issues with resolution over range, and the solutions are identical for both (superresolution, temporal blending, alternate wavelengths, stereo correspondence, etc).

Tesla people always say (said?) things like "Well humans only drive with their eyes, so cars should be able to as well," but that's not a true statement about what humans have in relation to what Teslas have. Humans have many more different sensor modalities than what Tesla's cameras give. Teslas have single-view fixed-focus cameras that, for much of the FOV, can only reconstruct structure from shape assumptions (object detection and classification) and inter-frame changes (optical flow) coupled with sensation of the vehicle's motion. That's all they get. It's not bad at all, especially coupled with advanced machine learning, but you do have more than that coupled with even more advanced machine learning. When you as a human drive, in addition to what Teslas have (you do also have them), you also have binocular stereopsis cues, autofocus lens convergence cues, vehicle-independent motion parallax cues, and the ability to manipulate shade cover so you don't get blinded. Are all those extra cues necessary for every scenario? No, obviously not. Do they help though? Yes. Try driving with only one eye open and without moving your body or head at all. You can absolutely do it, but you won't be as good as you would with both eyes open and free movement.

The article notes that these tests were all done in daylight, where Lidar provides less of an advantage.
It'll be difficult for any single company to compete with Tesla on scale and the AI we have so far rewards scale like no other technology before it.

Yes Waymo exists, but the amount of training data they have is a few orders of magnitude lower.

And yet Waymo is operating a real commercial service in multiple cities, and Tesla is in just one.