Hacker News new | ask | show | jobs
by dreaminvm 2621 days ago
They cherry-pick rare cases and use their fleet to get more examples of these situations. This seems like the right approach given more miles following the same car in a straight line is pretty useless.

My takeaway from the presentation is that Tesla will perform better than other companies in this space (although I don't know enough about Waymo to comment) due to the following:

-You want a large dataset (Tesla and many companies have this and can simulate) -You want a varied/diverse dataset (Tesla and many companies have this and can simulate)--the point here is simulations for simple cases work (you can only simulate when you know), but for complex ones are close to the difficulty of actual FSD -You want a real dataset (Tesla is the only company who can say this and can say they have data on how X00Ks of drivers will handle these situations)

1 comments

My point was that it appears that Tesla doesn't have a large and varied dataset. It has a small and pre-selected data set, since the cars only transmit data when pre-determined triggers are fired. Thus, it doesn't matter how many 1000s of drivers Tesla "has" or how many "situations" they're in, since it's not actually collecting data from most of these situations.

And Autpilot's performance (including its numerous regressions) suggests very strongly either that it doesn't have a very large data set, or else that it has a large data set of everyone doing roughly the same thing almost all of the time. These are the two most logical explanations for Autopilot's tendency to veer toward freeway dividers even (especially?) after updates.

> My point was that it appears that Tesla doesn't have a large and varied dataset. It has a small and pre-selected data set,

I don't thinks it is a fair characterization: first the notion of "large" is fairly subjective. But more importantly the fact that they collect data on pre-determined triggers is just a guarantee that the dataset is not over-fitted (let say to the 280 and 101 in the Bay Area and to Elon's commute in LA) and instead has good coverage of the world.

Their capabilities of triggering on situation allows them to grow the dataset quickly in a supervised way. It is all about the granularity of these triggers, imagine that you can express "collect situation in tunnels with jerk higher than X m/s3" or "collect all lane change abort in snow condition", ... In the next 24h you get data from all over the world and this data is automatically tagged and classified by the neural net.

Shadow mode would also transmit data no?