Hacker News new | ask | show | jobs
by bobsomers 1827 days ago
This framing is a common error in the debate. It's not cameras or lidar, it's cameras or cameras + lidar + radar. Nobody is driving on lidar alone. Many others actually have more cameras and are doing substantially more vision than Tesla is, they're just fusing lidar and radar perception with their vision pipeline. It gives you a more robust view of the world than using a single sensor modality.
4 comments

If you have one piece of rotten meat in a perfect stew, you still have a disgusting dish. Good sensor fused with garbage in is still garbage in. That was one of the major points of the talk - the vision-only system is more accurate than the one with other modalities thrown in, even though the latter has more data. We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

If anything, knowing when to reliably ignore a sensor modality is the kind of intuition more associated with general AI.

A similar paradox occurs when trying to fuse multispectral imagery. You'd think early fusion of RGB and IR would be better since it gives the higher-resolution filters access to more data, but it does worse than late fusion. My understanding is that late fusion forces the network to "work harder" to solve object detection using IR only, and then once you've wrung what you can out, then you fuse with RGB detections.

Since radar is "one pixel" there's essentially only one object detector possible: object or nothing. If yes-object, fusion tries really hard to make sense of the RGB filters to figure out what partial detection looks like an object, which is almost always a false positive.

You have fallen victim to the trap that this video so perfectly laid out for you. Tesla used the mmWave radar that has been in cars forever since it's a good way to do emergency braking and things like adaptive cruise control, particularly when you are a new company and you need these capabilities on your luxury sedan from day one. Now that they are much further along in their FSD efforts, they realize this mmWave radar isn't very helpful anymore. Cool, but nobody else was using it to begin with. LIDAR is totally and utterly different sensor technology.
Didn't Andrej show examples of emergency braking and how poorly radar performed v. vision?
He showed how poorly their outdated radar performed. The rest of the industry uses newer, vastly superior radars.
What radar did Tesla use? What does the rest of the industry use?
They are using Continental 2D radars from 2014. The rest of the SDC industry uses what is called 4D high resolution imaging radars. That's why Teslas have poor performance like phantom braking under bridges.
> That was one of the major points of the talk - the vision-only system is more accurate than the one with other modalities thrown in, even though the latter has more data. We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

This makes some important assumptions, namely that Tesla built a lidar and radar perception pipelines and sensor fusion of equivalent quality to their competitors, and then decided they were unnecessary.

Given that their competitors have shown substantially better perception than Tesla, and that Tesla has a significant economic incentive to deliver autonomous driving on a sensor suite that already shipped years ago, I find that difficult to believe. Did Tesla build good enough perception to dismiss lidar and radar purely on their merits. Unlikely I think. Did an intern build a student-quality lidar pipeline that "proved" Elon's camera-first approach is the right one? More likely.

Karpathy joined Tesla way before they ditched radar. You’re saying it’s more likely that he based his work on a “student-quality” prototype?
>We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

That sounds like a problem with the network's architecture and not the data itself

Sensor fusion is hard, especially with data at various qualities (different generations of LIDAR, radar, probably images are more portable actually).

I am actually on sensor-fusion side, and think a transformer can merge everything and generate a coherent world-view. But this is a hard problem and people side-step them after the evaluation shouldn't be dismissed blatantly.

For one:

> It gives you a more robust view of the world than using a single sensor modality.

How can we evaluate that correctly?

One of the arguments made in this talk is which sensor to trust when there is a disagreement. I'd love to see the scenarios where this happens with three types of sensors and the logic they use to choose the "right" one.
You can't avoid this problem by only having a single sensor modality, because you still have to decide if you trust your sensor data or not. In effect, you actually made the problem harder because you have no other information from any other sensor modalities to help understand the world around you.

In effect, they solved the disagreement problem by sticking their head in the sand and pretending their sensors are never wrong.

You’re intent on oversimplifying the issue to support your view, but this comment makes no sense. Of course you can avoid sensor fusion issues by not doing sensor fusion. Regardless of how many sources you have, you still have to decide if you trust the data, having less sensors doesn’t change that. And having more sensors means you have to decide which input to trust.
That's a problem for deciding which system drives the thing (eg. do you use cameras for detecting oncoming cars and objects or lidar/radar), but if there are major disagreements between them it would make most sense to disengage the autonomous part and ask the human driver to take over, much like current TSLA cars do. Obviously not possible if you want level 4 self driving, but it can be good enough for level 3.
> Many others actually have more cameras and are doing substantially more vision than Tesla

This is actually my main criticism of Tesla's approach. They don't seem to have enough cameras to do the job well, and its showing in a lot of actual system limitations.