Hacker News new | ask | show | jobs
by tansey 3616 days ago
> The data is useless unless it is annotated (e.g. a human labels where the lanes are, where the obstacles are, what the bicylist is doing, etc.) - that's the bottleneck, not collecting large amounts of raw sensor data + driver actions.

Except that is exactly what Nvidia did, and it worked out fine for them: https://arxiv.org/abs/1604.07316

1 comments

Nope. That car just does lane keeping - it doesn't even do turns or lane changes. This is all stuff solved 20 years ago. And even then it only achieves 98% autonomy on lane keeping - this is a task that needs 100% accuracy. You should not be running into the median every couple miles.

Furthermore, they augmented the data with left/right-offset cameras to supplement the data with examples of "bad" camera views. This is not present on Tesla cars (because these sensors are only used for training purposes)

In fact, the paper actually supports my point. They collected all this data for one task, lane keeping. They subdivided the problem of autonomous driving, and managed to solve one small subproblem (the easiest subproblem of autonomous driving, solved for decades already). They avoided the need for annotators, but only because they used specialized purpose-built cameras to augment the data.

> They collected all this data for one task, lane keeping. They subdivided the problem of autonomous driving, and managed to solve one small subproblem (the easiest subproblem of autonomous driving, solved for decades already). They avoided the need for annotators, but only because they used specialized purpose-built cameras to augment the data.

So why not several autonomous subsystems that use specialized purpose-built cameras and don't need annotators? I'm not saying that like it's easy - obviously it's not. Just seems scalable.

The solution was specific to that subproblem. The left/right-offset cameras were for the sole purpose of providing examples of what it would look like if the car was deviating off path. The same trick would not work for any other problems. Can you think of similar camera data augmentation tricks for obstacle detection, drivable path segmentation, bicyclist signaling/intention, pedestrian detection, and so on?