Hacker News new | ask | show | jobs
by wstrange 3455 days ago
A lot of folks have been saying that Tesla will win, because they will gather more data from real conditions.

This never made sense to me. You certainly need enough data, but how you interpret and process that data is far more important.

4 comments

The popular idea that Tesla will just keep stuffing matrices down the throat of their training pipeline until a self-driving inference model emerges from the other end doesn't make a lot of sense.
Neural nets can take you most of the way there, but pattern recognition alone will not solve the driving problem to completion.

Waymo's autonomous platform is a frankenstein of various machine learning techniques, and much of it isn't glamorous, it's less contigent on big breakthroughs than it is on elbow grease. Google demoed as proof-of concept full autonomy in 2012, and much of what they've been doing in the 4 years between then and now is the tedious job of addressing and validating their system across the full spectrum of edge cases that must be dealt with if they ever hope to foist their safety critical software upon the public.

It's not clear to me that Tesla's current development paradigm will ever be sufficient to completely take the human out of the loop. Tesla's approach is incremental, and I suspect they'll have to make some big changes if they wish to fully close the gap. Waymo has kept their eye on the prize from day 1.

Also, how much of that data can you realistically send back to yourself? With Waymo they can pull the arrays from the cars nightly if they need to. Tesla, on the other hand, has to send the data back over a customers network connection which would be much more limiting. So Tesla might be getting tons of miles but if that's a billion not very detailed miles, it might not be as useful as a million incredibly detailed miles.
I think Tesla pays for the network connection to all of its cars. This sends back remote telemetry and other self-driving data. It also serves up software updates and monitors the cars network for intrusions via a VPN.
That network connection is using cellular, so at most it's an LTE connection but if I remember right, is actually just 3G. I can't believe they are pushing the kind of data people are talking about over that kind of connection.
It makes sense because virtually all machine-learning algorithms work better, and can learn faster, if you have more data.

Think of it this way: if Tesla wants to test a particular algorithm for a particular driving situation, they can "play it back" over an enormous amount of real-world situations. They will have tons more potential edge cases with which they can validate their algorithms.

Where exactly is all this data being stored and transferred? Nowhere. The car doesn't have the storage and it doesn't have the bandwidth and all indications are it isn't transferring anything that amounts to actual images of the many many cameras it has.

Sure, they can push a beta algorithm to cars and record high-level decision making between human & algo, verifying it's not totally out of whack. But that's hardly something that is going as training data into the models.

This used to be true. Modern machine learning needs way less data. The classic example is taking images and then transforming them in hundreds of ways (scaling, rotation, skew, etc) for training.

Big data is no where near as much a competitive advantage as it was three years ago. It seems not everyone outside the field has noticed that though.

I wonder if this would be true in the case of self-driving car algorithms, though (which I know nothing about). It always seemed like the hard part about self-driving cars was the 0.1% edge cases where something out of the ordinary could result in a catastrophe if not handled correctly.

Image classification seems like it would be very different, most importantly that 99.9% "correct" would be a great achievement, but for self-driving cars a .1% failure rate would be completely unacceptable.

We are 30 to 50 years away of a level 5 car.
Do you have more examples of "big data is not as much a competitive advantage", in the form of articles or research? I'm not in this field, but it's a fascinating development. It would be interesting to see to which degree it helps to perform automated transformations to increase the value of each piece of training data.
Lul thinking machine learning is only image classification
Thank you for the down-votes. I guess all the experts on HN know how easy it is to simulate training data because they all took the 101 course on how to rotate/resample images. That is uniquely a image classification technique.

Please oh wise ones how do we simulate nlp data, numeric data, finance data, biological data and anything else machine learning is used for.

Oh you are able to classify dogs and cats in images after a 2 hour youtube. How nice.

Both you and renesd are correct.

Renesd is correct that "big data" is overblown. There are diminishing marginal returns - you need orders of magnitude more data for the same incremental gain (and this blows up well beyond however millions of cars Tesla can hope to run).

You're correct that data augmentation is only a marginal technique to squeeze out more performance, and not generally possible in many domains.

From what I've observed of member behavior on HN, I suspect that the downvotes may be a response to tone as opposed to content.
You need to think this through more carefully. What would "playing it back" give you? Presumably all that would give you are a bunch of incidents where the human driver disagreed with the algorithm's output. That's it. We don't know whether the algorithm is actually wrong. Maybe the driver made a mistake. Maybe the driver wanted to make an illegal turn (very common). Maybe the driver was lazy and made a rolling stop. A human still needs to go through each incident that is flagged and manually label it.
the data they have is not detailed enough though has been the criticism
Argument is that Tesla has more representative / real world data.

Big public opinion perspective here too.