Hacker News new | ask | show | jobs
by natvert 1785 days ago
This is a problem with learning the appropriate context. In the case of a yellow traffic light, there should be a traffic light fixture and two other (unlit) lights. Without this context, it becomes apparent the yellow pixels in a circular shape (representation to the network) are not a traffic light at all.

Tesla’s vision/ml systems are amazing. I would love to learn more about how unit testing for this type of error is done. Without some intermediate semantic representation, I don't see how these large, multi-head, end-to-end systems can isolate and regression? System tests are maybe possible, but it's unclear how well a system test would generalize to related, but unseen cases.

8 comments

> Tesla’s vision/ml systems are amazing

More importantly, they're the only ones accessible to the average consumer. Waymo/Zoox/Daimler all have equally if not more impressive systems.

A real issue with tesla is that they want to be vision-only, which is going to make getting to level 5 first almost impossible.

BTW - knowing where the moon is happens to be an extremely solved problem.

> knowing where the moon is happens to be an extremely solved problem.

It's true... but I don't think that's how Tesla would solve it. Their goal is to create a neural network "driver" which can drive in any place even if it has never seen it before. They'd rather teach their neural network that the moon and stoplights are not to be confused visually. Thought I suppose in searching for training examples they could use the known position of the moon for approximate labeling.

>which can drive in any place even if it has never seen it before

Isn't that impossible considering these networks need training and therefore have seen everything before?

What I mean specifically is that competitors self driving systems use "HD maps" meaning they store the entire world in 3D and then they localize themselves to that world (at least, this is what Karpathy says competitors do). So those systems cannot navigate any stretch of road they haven't seen.

But with the Tesla, it is learning to drive in general. It does not need an HD map of a fork in the road to understand how to navigate it. Just as a person who learned to drive in California will have little trouble driving in Florida, a neural network that has learned to drive on a million intersections will be pretty good at navigating most intersections. Especially because the corner cases will stand out and become integrated in to training. So it may see many intersections, but it will generally know what to do with one even if it has never seen it before.

Though I would suspect that the competitors are perhaps using HD maps to jumpstart a system that long term would behave more like the Tesla one. Mapping every road is a lot to ask.

Complicated - they should be able to piece together a route with reasonably up to date "street view but like for self driving" imagery, up to date maps, and reasonable weather conditions.

One actually really important feature of these systems is how they handle failure. If the car gets confused, how does it handle it?

I would have looked at that as a stretch goal, and very stretching at it. Because I won't need very soon an autopilot to take me through the woods, especially if it's that trustworthy as it feels to be. I'd be happy to have one to drive reliably highway and wake me up when we're in close range from destination.
Well that’s what Tesla is doing. The current system can drive itself on highways and they are pushing to create a system capable of driving on all roads.
> In the case of a yellow traffic light, there should be a traffic light fixture and two other (unlit) lights. Without this context, it becomes apparent the yellow pixels in a circular shape (representation to the network) are not a traffic light at all.

Permanent yield signals are often only one flashing yellow light. Crosswalk signals are often only flashing yellow lights when active. Temporary construction barrier signals are often only yellow flashing lights. Fire station signals often have only red and yellow lights, no green. Even when all three lights are present, they may also be oriented horizontally, or in triangular shapes.

Which isn't to say there isn't more context to learn from, but just about the only true unifying trait among all these indicators that you should perhaps pay attention and slow down is a bit of yellow light. It need not even be circular: yellow arrows are far from uncommon, including "straight ahead" yellow arrows for intersections where turns are forbidden.

You can learn about their systems by watching talks by Andrej Karpathy. As a robotics engineer interested in vision, their architecture is inspiring. This talk [1] is a good overview but each talk he gives is a little different so search for more if you want to know as much as possible.

But the big thing is that their autonomy computer can be programmed to look for odd scenarios and send them back home. Tesla uses their fleet of hundreds of thousands of cars to collect edge cases like this, and then they have a kind of compartmentalized neural network system that breaks apart disparate tasks. With their collected examples they can create unit tests to ensure that the moon stops activating the stoplight detector. Once trained, the unit tests presumably help ensure they don't end up with future regressions.

So basically every time you see a Tesla do a weird thing, there is a good chance it will stop doing it soon enough. At least if it's hitting hacker news.

[1] https://www.youtube.com/watch?v=hx7BXih7zx8

But in foggy or dark situations, a yellow light can indeed appear as a floating yellow circle.
There was another video of a Tesla being confused by a truck carrying traffic lights on the truck bed. I do like the idea of what Waymo does, which is to run their model through a virtual world to see how it performs. It's amusing to imagine each of these edge cases being added in as they are encountered.
I've definitely driven through places where there is only one light. Blinks red for stop (no stop sign). Many times, for cross traffic you'll have the one light but blinking yellow as a warning at the intersection.
If that would be a traffic light in front of the car, it would get bigger as a car moves towards it. Basic perspective knowledge. But it seems the system is not considering it or not putting enough weight on it.
Karpathy's latest talk says they have 6000 tests cases(video clips) that each new version of the model has to predict the right answer on, for it to be released.