Hacker News new | ask | show | jobs
by liuliu 1823 days ago
You get sparse point cloud from LIDAR sensors, not accurate 3D maps. This is the main reason why some people think LIDAR may not work well (mostly, only comma.ai and Tesla folks).

Vision can also get you 3D maps, either in active manner (IR floodlight or structured lighting) or not. I will reserve my judgement until see more from either side.

4 comments

This framing is a common error in the debate. It's not cameras or lidar, it's cameras or cameras + lidar + radar. Nobody is driving on lidar alone. Many others actually have more cameras and are doing substantially more vision than Tesla is, they're just fusing lidar and radar perception with their vision pipeline. It gives you a more robust view of the world than using a single sensor modality.
If you have one piece of rotten meat in a perfect stew, you still have a disgusting dish. Good sensor fused with garbage in is still garbage in. That was one of the major points of the talk - the vision-only system is more accurate than the one with other modalities thrown in, even though the latter has more data. We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

If anything, knowing when to reliably ignore a sensor modality is the kind of intuition more associated with general AI.

A similar paradox occurs when trying to fuse multispectral imagery. You'd think early fusion of RGB and IR would be better since it gives the higher-resolution filters access to more data, but it does worse than late fusion. My understanding is that late fusion forces the network to "work harder" to solve object detection using IR only, and then once you've wrung what you can out, then you fuse with RGB detections.

Since radar is "one pixel" there's essentially only one object detector possible: object or nothing. If yes-object, fusion tries really hard to make sense of the RGB filters to figure out what partial detection looks like an object, which is almost always a false positive.

You have fallen victim to the trap that this video so perfectly laid out for you. Tesla used the mmWave radar that has been in cars forever since it's a good way to do emergency braking and things like adaptive cruise control, particularly when you are a new company and you need these capabilities on your luxury sedan from day one. Now that they are much further along in their FSD efforts, they realize this mmWave radar isn't very helpful anymore. Cool, but nobody else was using it to begin with. LIDAR is totally and utterly different sensor technology.
Didn't Andrej show examples of emergency braking and how poorly radar performed v. vision?
He showed how poorly their outdated radar performed. The rest of the industry uses newer, vastly superior radars.
What radar did Tesla use? What does the rest of the industry use?
> That was one of the major points of the talk - the vision-only system is more accurate than the one with other modalities thrown in, even though the latter has more data. We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

This makes some important assumptions, namely that Tesla built a lidar and radar perception pipelines and sensor fusion of equivalent quality to their competitors, and then decided they were unnecessary.

Given that their competitors have shown substantially better perception than Tesla, and that Tesla has a significant economic incentive to deliver autonomous driving on a sensor suite that already shipped years ago, I find that difficult to believe. Did Tesla build good enough perception to dismiss lidar and radar purely on their merits. Unlikely I think. Did an intern build a student-quality lidar pipeline that "proved" Elon's camera-first approach is the right one? More likely.

Karpathy joined Tesla way before they ditched radar. You’re saying it’s more likely that he based his work on a “student-quality” prototype?
>We intuit that the fusion network should just learn to ignore the bad sensor when it's unreliable, but this rarely happens in practice.

That sounds like a problem with the network's architecture and not the data itself

Sensor fusion is hard, especially with data at various qualities (different generations of LIDAR, radar, probably images are more portable actually).

I am actually on sensor-fusion side, and think a transformer can merge everything and generate a coherent world-view. But this is a hard problem and people side-step them after the evaluation shouldn't be dismissed blatantly.

For one:

> It gives you a more robust view of the world than using a single sensor modality.

How can we evaluate that correctly?

One of the arguments made in this talk is which sensor to trust when there is a disagreement. I'd love to see the scenarios where this happens with three types of sensors and the logic they use to choose the "right" one.
You can't avoid this problem by only having a single sensor modality, because you still have to decide if you trust your sensor data or not. In effect, you actually made the problem harder because you have no other information from any other sensor modalities to help understand the world around you.

In effect, they solved the disagreement problem by sticking their head in the sand and pretending their sensors are never wrong.

You’re intent on oversimplifying the issue to support your view, but this comment makes no sense. Of course you can avoid sensor fusion issues by not doing sensor fusion. Regardless of how many sources you have, you still have to decide if you trust the data, having less sensors doesn’t change that. And having more sensors means you have to decide which input to trust.
That's a problem for deciding which system drives the thing (eg. do you use cameras for detecting oncoming cars and objects or lidar/radar), but if there are major disagreements between them it would make most sense to disengage the autonomous part and ask the human driver to take over, much like current TSLA cars do. Obviously not possible if you want level 4 self driving, but it can be good enough for level 3.
> Many others actually have more cameras and are doing substantially more vision than Tesla

This is actually my main criticism of Tesla's approach. They don't seem to have enough cameras to do the job well, and its showing in a lot of actual system limitations.

> You get sparse point cloud from LIDAR sensors

Take a look at how dense the point cloud from Waymo's 5th gen LIDAR is: https://www.youtube.com/watch?v=COgEQuqTAug&t=11599s. They just talked about this a few days ago.

Any idea why this video went private?
No idea, but here's another video of a different Waymo presentation showing the same point cloud: https://www.youtube.com/watch?v=uOLLrZzljs8&t=5420s
They're cutting up the original 3 hour long Livestream into separate videos for each presentation.
Point clouds from the next generation of LIDAR are looking less and less sparse.

https://youtu.be/COgEQuqTAug?t=11601

This video is private, I cannot access.
LIDAR is valuable as a safety feature, i.e. it can (unlike radars or cameras) reliably see (at least in clear weather) if there's anything in the car's path warranting evasion/braking maneuvers. In particular it's important that the LIDAR is dumb, i.e. its failure mores are predictable
LIDAR operates by timing how a photon reflects from a surface, it doesn't guarantee see everything. As you stated already, it cannot see in snow or rain.

I am actually on sensor-fusion side, but I don't think we should jump to the conclusion LIDAR is the best 3D mapping method.

For one example, a truck has a breaking distance of 600ft, either Velodyne or OS1 LIDAR has range less than that.

I agree about LIDARs not being the best general 3D mapping method, my point was mostly about using it as a dumb physics-based safety system. For autonomous trucks, limited LIDAR range could be mitigated by reducing speed accordingly and/or employing more powerful (or SWIR) LIDARs since trucks are more expensive and there could be a bigger budget for sensors