The box isn't as black as you might think; they're not training some monolithic AI model, there are separated systems involved. Also, the models aren't entirely freeform; i.e. engineers embed knowledge of how the world is structured into those networks.
They can use those intermediates to project a kind of thought process others can look at - and you've probably seen videos and images of that kind of thing too; i.e. a rendered version of the 3d intermediate world it's perceived, enhanced with labels, enhanced with motion vectors, cut into objects, classified by type of surface, perhaps even including projections of likely future intent of the various actors, etc.
Sure, you can't fully understand how each individual perceptron contributes to the whole, but you can understand why the car suddenly veered right, what it's planned route was, what it thought other traffic participants were about to do, which obstacles it saw, whether it noticed the pedestrians, which traffic rules it was aware of, whether it noticed the traffic lights (and which ones) and how much time it thought remained etc.
...at least, sometimes; I don't know anybody working at Tesla specifically.
And while they emphasize their lidar tech, I bet Tesla's team, while using different sensors, also has somewhat similarly complex - and inspectable - intermediate representations.
IIRC, in the incident where the Tesla [Edit: Uber self driving car] collided with a pedestrian pushing a bicycle in Arizona, the Tesla repeatedly switched between calling the input a pedestrian and a bicycle. And took no evasive actions while it was trying to decide.
>the incident where the Tesla collided with a pedestrian pushing a bicycle in Arizona
That was Uber's self driving car program. Notably, the SUV they were using has had pedestrian detecting auto-stopping for several years, though I'm sure it's not 100%
They can use those intermediates to project a kind of thought process others can look at - and you've probably seen videos and images of that kind of thing too; i.e. a rendered version of the 3d intermediate world it's perceived, enhanced with labels, enhanced with motion vectors, cut into objects, classified by type of surface, perhaps even including projections of likely future intent of the various actors, etc.
Sure, you can't fully understand how each individual perceptron contributes to the whole, but you can understand why the car suddenly veered right, what it's planned route was, what it thought other traffic participants were about to do, which obstacles it saw, whether it noticed the pedestrians, which traffic rules it was aware of, whether it noticed the traffic lights (and which ones) and how much time it thought remained etc.
...at least, sometimes; I don't know anybody working at Tesla specifically.
Here, for example waymo has a public PR piece kind of highlighting all the kind of stuff they can extract from the black box: https://blog.waymo.com/2021/08/MostExperiencedUrbanDriver.ht...
And while they emphasize their lidar tech, I bet Tesla's team, while using different sensors, also has somewhat similarly complex - and inspectable - intermediate representations.