Hacker News new | ask | show | jobs
by adevine 3455 days ago
It makes sense because virtually all machine-learning algorithms work better, and can learn faster, if you have more data.

Think of it this way: if Tesla wants to test a particular algorithm for a particular driving situation, they can "play it back" over an enormous amount of real-world situations. They will have tons more potential edge cases with which they can validate their algorithms.

4 comments

Where exactly is all this data being stored and transferred? Nowhere. The car doesn't have the storage and it doesn't have the bandwidth and all indications are it isn't transferring anything that amounts to actual images of the many many cameras it has.

Sure, they can push a beta algorithm to cars and record high-level decision making between human & algo, verifying it's not totally out of whack. But that's hardly something that is going as training data into the models.

This used to be true. Modern machine learning needs way less data. The classic example is taking images and then transforming them in hundreds of ways (scaling, rotation, skew, etc) for training.

Big data is no where near as much a competitive advantage as it was three years ago. It seems not everyone outside the field has noticed that though.

I wonder if this would be true in the case of self-driving car algorithms, though (which I know nothing about). It always seemed like the hard part about self-driving cars was the 0.1% edge cases where something out of the ordinary could result in a catastrophe if not handled correctly.

Image classification seems like it would be very different, most importantly that 99.9% "correct" would be a great achievement, but for self-driving cars a .1% failure rate would be completely unacceptable.

We are 30 to 50 years away of a level 5 car.
Do you have more examples of "big data is not as much a competitive advantage", in the form of articles or research? I'm not in this field, but it's a fascinating development. It would be interesting to see to which degree it helps to perform automated transformations to increase the value of each piece of training data.
Lul thinking machine learning is only image classification
Thank you for the down-votes. I guess all the experts on HN know how easy it is to simulate training data because they all took the 101 course on how to rotate/resample images. That is uniquely a image classification technique.

Please oh wise ones how do we simulate nlp data, numeric data, finance data, biological data and anything else machine learning is used for.

Oh you are able to classify dogs and cats in images after a 2 hour youtube. How nice.

Both you and renesd are correct.

Renesd is correct that "big data" is overblown. There are diminishing marginal returns - you need orders of magnitude more data for the same incremental gain (and this blows up well beyond however millions of cars Tesla can hope to run).

You're correct that data augmentation is only a marginal technique to squeeze out more performance, and not generally possible in many domains.

From what I've observed of member behavior on HN, I suspect that the downvotes may be a response to tone as opposed to content.
You need to think this through more carefully. What would "playing it back" give you? Presumably all that would give you are a bunch of incidents where the human driver disagreed with the algorithm's output. That's it. We don't know whether the algorithm is actually wrong. Maybe the driver made a mistake. Maybe the driver wanted to make an illegal turn (very common). Maybe the driver was lazy and made a rolling stop. A human still needs to go through each incident that is flagged and manually label it.
the data they have is not detailed enough though has been the criticism