Hacker News new | ask | show | jobs
by jfim 758 days ago
They are doing inferencing on the vehicle for lane keeping, traffic sign detection, emergency braking, etc.

The biggest problem is really how do you get to 10^n miles per disengagement, for n>=5. Waymo is kinda getting there, Tesla isn't anywhere near that today.

Getting there is really hard, because that's when you get all of the long tail events like bears, moose, wild turkeys, horse mounted police officers, costume conventions, pickup trucks carrying traffic cones and road signs, flooded streets, construction pilot cars, vehicles driving the wrong way on the highway, downed electric poles, NYC steam plumes, and tons of other scenarios. Highway driving in nice and sunny conditions is easy compared to that.

1 comments

I am not sure how much really is done via inferencing, if at all. Just the way how "Tesla Vision" behaves in a parking garage does simply not look like what I would expect to come out of inferencing. It looks very, very, very much like a pretty bad heuristic. Just look what it makes out of blind spots, the parts the cameras can't see. There is absolutely nothing like "according to my model, there should be X on this spot". The same goes for their distancing sensing in these situations. "Oh, there is a pipe on that wall, which likely has difference distance to me than the wall. I might not wanna crash into that" is trivial on a level that nobody would even use that as a Captcha these days. A model that does not "know" what the third dimension is?

Do you know of any reverse engineering that proves that there really is running anything in regards of inferencing on the NPUs?

Also, just as you said - there are tons of corner cases in the real world, especially once you aren't on a 10-lane US highway which has been designed for monster trucks driven by 16 year olds (no offence) but one of the roundabouts of hell in Paris.

Where would the training data been coming from?

So, I have my doubts.

During summer, there is a red flower growing near the entrance of my parking garage. It constantly is seen as a red light, and the entrance of my garage is often mistaken for a huge truck suddenly magically appearing. Again: Nobody would use a Captcha these days: "Is this a red flower or a traffic light?".

Again, smells like heuristic. "Amount of red pixels in a certain form and spot".

Oh I see what you mean.

Typically, inference in a machine learning context means feeding a model some input and looking at its output. I'm pretty sure that they are running some model on the vehicle that takes pixels as input and says this part of the image is a car/truck/traffic sign/lane line/etc. It might be misclassifying things (eg. the flower as a red light), but would still be running some kind of model.

As you point out though, the model only seems to do some simple object detection, but doesn't have much of an understanding of what it sees (eg. does it make sense that there would be a traffic light at this location). There are plenty of videos of it getting confused by all kinds of situations (eg this one from a few years ago https://www.businessinsider.com/tesla-fsd-full-self-driving-... ).