Hacker News new | ask | show | jobs
by theptip 532 days ago
Karpathy discussed this at length on the Lex Fridman podcast: https://lexfridman.com/andrej-karpathy/

IIRC I think it’s the section (1:23:25) – Camera vision

The TL;DR is that sensor fusion is really hard, and their bet was that keeping the training pipelines simpler would let them scale faster/easier, and human vision is the existence proof that it can be done without lidar.

1 comments

One of the big flaws in Karpathy's logic is that it implies human vision is acceptable and sufficient for an AV. The reality, as Cruise found out, seems to be that society will demand AVs are much safer than humans.

Human vision is an existence proof for human-level performance without lidar, but Waymo is an existence proof for 10x human performance WITH lidar. Right now the latter is where the bar is, and it'll keep being raised. I don't think at this point one could get away with deploying AVs at scale that are significantly less safe than Waymo.

Also: if sensor fusion is so hard, why is Waymo able to solve it but not Tesla?

> Also: if sensor fusion is so hard, why is Waymo able to solve it but not Tesla?

I think Karpathy's point is that Tesla wants to try to avoid the "entropy" that comes from adding a sensor (senior software engineers and higher understand this concept). Every sensor (and every version of it -- sensor hardware does get updated) you add requires recalibrating the software stack, the hardware design, which introduces points of failure every time you roll it out.

According to Karpathy, Tesla does use Lidar -- but only at training time, as a source of truth. Once the weights are learned, they operate without the Lidar.

Have a full sensor suite may work for Waymo at the current scale (limited cities), but scaling beyond that poses problems.

Whereas Tesla has to work with a different set of scaling economics -- that of a mass market vehicle already deployed globally.