|
|
|
|
|
by Fischgericht
758 days ago
|
|
I am not sure how much really is done via inferencing, if at all. Just the way how "Tesla Vision" behaves in a parking garage does simply not look like what I would expect to come out of inferencing. It looks very, very, very much like a pretty bad heuristic. Just look what it makes out of blind spots, the parts the cameras can't see. There is absolutely nothing like "according to my model, there should be X on this spot". The same goes for their distancing sensing in these situations. "Oh, there is a pipe on that wall, which likely has difference distance to me than the wall. I might not wanna crash into that" is trivial on a level that nobody would even use that as a Captcha these days. A model that does not "know" what the third dimension is? Do you know of any reverse engineering that proves that there really is running anything in regards of inferencing on the NPUs? Also, just as you said - there are tons of corner cases in the real world, especially once you aren't on a 10-lane US highway which has been designed for monster trucks driven by 16 year olds (no offence) but one of the roundabouts of hell in Paris. Where would the training data been coming from? So, I have my doubts. During summer, there is a red flower growing near the entrance of my parking garage. It constantly is seen as a red light, and the entrance of my garage is often mistaken for a huge truck suddenly magically appearing. Again: Nobody would use a Captcha these days: "Is this a red flower or a traffic light?". Again, smells like heuristic. "Amount of red pixels in a certain form and spot". |
|
Typically, inference in a machine learning context means feeding a model some input and looking at its output. I'm pretty sure that they are running some model on the vehicle that takes pixels as input and says this part of the image is a car/truck/traffic sign/lane line/etc. It might be misclassifying things (eg. the flower as a red light), but would still be running some kind of model.
As you point out though, the model only seems to do some simple object detection, but doesn't have much of an understanding of what it sees (eg. does it make sense that there would be a traffic light at this location). There are plenty of videos of it getting confused by all kinds of situations (eg this one from a few years ago https://www.businessinsider.com/tesla-fsd-full-self-driving-... ).