| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kortex 1823 days ago

If you have one piece of rotten meat in a perfect stew, you still have a disgusting dish. Good sensor fused with garbage in is still garbage in. That was one of the major points of the talk - the vision-only system is more accurate than the one with other modalities thrown in, even though the latter has more data. We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

If anything, knowing when to reliably ignore a sensor modality is the kind of intuition more associated with general AI.

A similar paradox occurs when trying to fuse multispectral imagery. You'd think early fusion of RGB and IR would be better since it gives the higher-resolution filters access to more data, but it does worse than late fusion. My understanding is that late fusion forces the network to "work harder" to solve object detection using IR only, and then once you've wrung what you can out, then you fuse with RGB detections.

Since radar is "one pixel" there's essentially only one object detector possible: object or nothing. If yes-object, fusion tries really hard to make sense of the RGB filters to figure out what partial detection looks like an object, which is almost always a false positive.

3 comments

stefan_ 1823 days ago

You have fallen victim to the trap that this video so perfectly laid out for you. Tesla used the mmWave radar that has been in cars forever since it's a good way to do emergency braking and things like adaptive cruise control, particularly when you are a new company and you need these capabilities on your luxury sedan from day one. Now that they are much further along in their FSD efforts, they realize this mmWave radar isn't very helpful anymore. Cool, but nobody else was using it to begin with. LIDAR is totally and utterly different sensor technology.

link

sumnuyungi 1823 days ago

Didn't Andrej show examples of emergency braking and how poorly radar performed v. vision?

link

ra7 1823 days ago

He showed how poorly their outdated radar performed. The rest of the industry uses newer, vastly superior radars.

link

sumnuyungi 1823 days ago

What radar did Tesla use? What does the rest of the industry use?

link

ra7 1823 days ago

They are using Continental 2D radars from 2014. The rest of the SDC industry uses what is called 4D high resolution imaging radars. That's why Teslas have poor performance like phantom braking under bridges.

link

jiofih 1822 days ago

What models on the market use 4D imaging radars? All info I can find shows products releases in the past 1-2 years for future vehicles.

link

bobsomers 1823 days ago

> That was one of the major points of the talk - the vision-only system is more accurate than the one with other modalities thrown in, even though the latter has more data. We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

This makes some important assumptions, namely that Tesla built a lidar and radar perception pipelines and sensor fusion of equivalent quality to their competitors, and then decided they were unnecessary.

Given that their competitors have shown substantially better perception than Tesla, and that Tesla has a significant economic incentive to deliver autonomous driving on a sensor suite that already shipped years ago, I find that difficult to believe. Did Tesla build good enough perception to dismiss lidar and radar purely on their merits. Unlikely I think. Did an intern build a student-quality lidar pipeline that "proved" Elon's camera-first approach is the right one? More likely.

link

jiofih 1822 days ago

Karpathy joined Tesla way before they ditched radar. You’re saying it’s more likely that he based his work on a “student-quality” prototype?

link

aaaxyz 1823 days ago

>We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

That sounds like a problem with the network's architecture and not the data itself

link