Hacker News new | ask | show | jobs
by sairahul82 2499 days ago
The first thing we need to remember is the self driving doesn't work like our brain. If they do then we don't need to train them with billions of images. So the main problem is not just building the 3d models. For example we don't crash into the car because we never seen that car model or that kind of vehicle before. Check https://cdn.technologyreview.com/i/images/bikeedgecasepredic... we never think that there is a bike infront of us.

Humans do lot more than just identifying an image or doing 3d reconstruction. We have context about the roads, we constantly predict the movement of other cars, we do know how to react based on the situation and most importantly we are not fooled by simple image occlusions. Essentially we have a gigantic correlation engine that takes decision based on comprehending different things happening on the road.

The AI algorithms we teach does not work in the same way as we do. They overly depend on the identifying the image. Lidar provides another signal to the system. It provides redundancy and allows the system to take the right decision. Take the above linked image for an example.

We may not need a lidar once the technology matures but at this stage it is a pretty important redundant system.

4 comments

> So the main problem is not just building the 3d models

That's not relevant when discussing which technology to use to build the 3d models. Everything you said is accurate until the last few sentences. Lidar provide the same information (line of sight depth) as stereo cameras, just in a different way. The person you're responding to is talking about depth from stereo, not cognition.

> Lidar provide the same information (line of sight depth) as stereo cameras, just in a different way.

This is incorrect, the amount of parallax you need to get the same kind of accurate depth using camera is infeasible. Velodynes other common lidar now gets you points accurate at 150m+. Cameras can't do that, and if you use nets to guess you'll still make mistakes.

> The person you're responding to is talking about depth from stereo, not cognition.

You miss the point; saying human 3D reconstruction works because of sensors without world context is naive. The response was trying to capture that; human perception systems utilize context / background knowledge extensively.

> the amount of parallax you need to get the same kind of accurate depth using camera is infeasible. Velodynes other common lidar now gets you points accurate at 150m+

I meant they both just provide line of sight depth.

The point being made by the first comment is that human eyeballs placed one inch apart are currently the gold standard for the actual looking part. So the right set of cameras is by definition sufficient for the looking part of driving. The cameras just have to replace eyes well enough. The brain replacement is farther down the chain.

From the OP:

> humans can build near perfect 3D representations of the world with 2D images stitched together with the parallax neural nets in our brain

This is a statement about cognition. And the response addresses this.

Your response:

> The person you're responding to is talking about depth from stereo, not cognition.

I think this is the disconnect. The person _is_ talking about cognition. OP makes a claim about how humans see, connected to how the human brain works. Response explains why camera-based image recognition right now is a lot worse than your eyes (a big piece of the answer is your brain).

> The cameras just have to replace eye well enough

So yes this is nice in theory. But I also get the sense most people don't realize just how large the chasm is today between cameras and human eyes. They don't "just provide line of sight depth." Dynamic range, field of view, reliability even under conditions like high heat -- there are many other dimensions where they just aren't analogous yet.

> The first thing we need to remember is the self driving doesn't work like our brain. If they do then we don't need to train them with billions of images.

I had always assumed that the first few years of infancy was effectively a period of training a neural net (the brain) against a continuous series of images (everything seen).

Where is the bike example from? All these instances of recognition error are meaningless when they don’t come from actual production systems by auto makers. They don’t just slap OpenCV into a car.
Having a redundant system is the key here.

Also provides a reliable source of data, if humans have a LiDAR in their system then we would use it to improve our decisions.

I don’t see why we should limit the AV.