Hacker News new | ask | show | jobs
by apnew 1213 days ago
> The FSD Beta system may cause crashes by allowing the affected vehicles to: “act unsafe around intersections, such as traveling straight through an intersection while in a turn-only lane, entering a stop sign-controlled intersection without coming to a complete stop, or proceeding into an intersection during a steady yellow traffic signal without due caution,” according to the notice on the website of the National Highway Traffic Safety Administration.

Does anyone have insights on what QA looks like at Tesla for FSD work? Because all of these seem table-stakes before even thinking about releasing the BETA FSD.

8 comments

> Does anyone have insights on what QA looks like at Tesla for FSD work? Because all of these seem table-stakes before even thinking about releasing the BETA FSD.

Tesla is not exactly in love with QA. Especially for FSD.

FSD is mainly 2 things: 1. (By far most important) shareholder value creating promise, that's been solved for 6 years according to their CEO. 2. Software engineering research project

What FSD is not is a safety critical systems (which it should be). They focus on cool ML stuff and getting features, with any disregard for how to design, build and test safety critical systems. Validation and QA is basically non-existent.

Do you have actual knowlage of Tesla internal QA processes, any kind of source at all?

Based on there presentation, they for sure have a whole load of tests, many built directly from real world situation that the car has to handle. They simulate sensor input based on the simulation and check the car does the right thing.

They very likely have some internal test drivers and before the software goes public it goes to the cars of the engineers.

Those are just some of things we know about.

I have no source on their approach to testing safety critical systems, but we do know that they have a lot of software that has based all test by all the major governments. They are one of the few (or only) car maker fully compliant to a number of standards on automated breaking in the US. We have many real world example of videos where other cars would have killed somebody and the Tesla stopped based on image recognition.

So they do clearly have some idea of how to do this stuff.

So when making these claims I would like to know what they are based on. It might very well be true that their processes are insufficient but I would actual know some real data. Part of what a government could do, is forcing car maker to open their QA processes.

Or the government could (should) have its own open test suit that a car needs to be able to handle, but clearly we are not there yet.

2 sources.

1. I know people working at Tesla.

2. Much more important one - Elon's Twitter feed. They're doing last minute changes, and once it compiles and passes some automated tests, it's tested internally only over few days before it's released to the customers. Even if they had world class internal testing (they don't), for something having to work in such a diverse environment like self driving system without any geo-fencing, those timelines are all you need to know.

Some manufacturers hold off on newer, untested tech for years before adding that to their vehicles. This is what happens when safely is a priority.

That's why I bought/will keep buying Toyota/Lexus.

Euro NCAP etc. seem to classify Teslas as (some of) the safest vehicles on roads.

https://www.euroncap.com/en/results/tesla/model+y/46618

Same for NHTSA:

https://www.nhtsa.gov/vehicle/2022/TESLA/MODEL%2525203/4%252...

Because they have a lower center of gravity and good crash structure. These are good things. But avoiding the accident in the first place significantly reduces the need to test that crashworthiness.
That's really not the point.

Because of the FSD false promises, Tesla encourages dangerous behavior from drivers.

I don't want to be next to a Tesla driving in autonomous mode while it's driver at the wheel is not paying attention to me.

I strongly feel people ought to have these discussions while consistently citing actual data sources relevant to the discussion.

For example, did you predict, based on the speculation of Tesla being incompetent with regard to safety, that they have the lowest probability of injury scores of any car manufacturer? Because they do.

Did you predict, based on speculation about Elon Musk's incompetence in predicting that self-driving would happen, that there are millions of self-driving miles each quarter? Because there are.

Did you predict, based on speculation about Tesla incompetence in full self-driving, that the probability of accident per mile is lower rather than higher in cars that have self-driving capabilities? Because they do.

I know this sort of view is very controversial on Hacker News, but I still think it is worth stating, because I think people are actually advocating for policies which kill people because they don't actually know the data disagrees with their assumptions.

https://www.tesla.com/VehicleSafetyReport

Unaudited (internal Tesla data), cherry-picked (comparing with average cars in USA, which are 12 years old beaters, to their very young fleet of expensive cars) data, that doesn't correct for any bias (highway driving vs non-highway driving being one of the many issues) is not exactly the magic bullet you think it is.

Also, none of that is self driving. This data talks about AP, not FSD. FSD is also not self driving by any means (it's level 2 driver assist), but that's a detail at this point.

I didn't say it was a magic bullet. So you are hallucinating thoughts about me, not responding to what I said. Being critical of the data like you are being is good thinking in my opinion. I just don't like how often people don't have beliefs that are anywhere close to the data.

For example, elsewhere in this comment thread, someone threw out a random statistic of 400:1 as part of their argument, but this seems to me to be something like six orders of magnitude diverged from a data informed estimate.

To try and contextualize how big an error that is - it is like thinking that a house in the Bay Area has the same cost as a soft drink.

I think if we have to cite our data we are less likely to do that sort of error and more likely to catch it when it is done.

I definitely don't think FSD is magically safe. So if you think that is what I'm trying to say, please update your beliefs according to my correction that I do not believe this. I think anyone driving in FSD should remain vigilant, because it can make worse decisions than a human would.

A system that protects 400 people but kills 1 is not a system that I want on public roads because I don't want to be in the 1 - Elon and the children of Elon are basically making the assumption that everyone is okay with this.

The probability of an accident for any driver assistance system will ALWAYS be lower than a human driver - but that doesn't mean the system is safe for use with the general public!

People like me are not advocating for "killing people" because we aren't looking at data - it's that no company has the right to make these tradeoffs without the permission and consent of the public.

Also if this was about safety and not just a bunch of dudes who think they are cool because their Tesla can kinda drive itself, why does "FSD" cost $16,000?

> People like me are not advocating for "killing people"

If you are advocating against a system that protects 400 people and kills one, you are advocating for killing people.

> A system that protects 400 people but kills 1 is not a system that I want on public roads because I don't want to be in the 1 - Elon and the children of Elon are basically making the assumption that everyone is okay with this. > > The probability of an accident for any driver assistance system will ALWAYS be lower than a human driver - but that doesn't mean the system is safe for use with the general public!

Totally we should be wary of a system that protects 400 and kills 1. Thank you for providing the numbers. It helps me show my point more clearly.

If you are driving on a road you encounter cars. Each car is a potential accident risk. You probably encounter a few hundred cars after ten or so miles. Not every car crash kills, but lets just assume they all do to make this simpler. For the stat you propose, you are talking about feeling uncomfortable with an accident per mile of something around the ballpark of ten miles.

Now lets look at the data. The data suggests the actual miles per accident is closer to 6,000,000 miles per accident. This is six orders of magnitude diverged from the number of miles per accident that you imply would make you feel uncomfortable.

Lets try shifting that around to a context people are more familiar with: a one dollar purchase would be a soft drink and a six million dollar purchase would be something like buying a house in the bay area. This is a pretty big difference I think. I feel very differently about buying a soft drink versus buying a house in the Bay Area. If someone told me they felt that buying a house was cheap, then gave a proposed price for the house that was more comparable to the cost of buying a soft drink, I might suspect they should check the dataset to get a better estimate of the housing prices, because it might give them a more reasonable estimate.

So I very strongly feel we should cite the numbers we use. For example, I feel like you should really try and back up the use of the 400 to 1 number so I understand why you feel that is a reasonable number, because I do not feel that it is a reasonable number.

> Also if this was about safety and not just a bunch of dudes who think they are cool because their Tesla can kinda drive itself, why does "FSD" cost $16,000?

Uh, we are a on venture capitalist adjacent forum. You obviously know. But... well, the price of FSD is tuned to ensure the company is profitable despite the expense of creating it as is common in capitalist economies with healthy companies seeking to make a profit in exchange for providing value. It is actually pretty common for high effort value creation, like creation of a self-driving car or the performance of surgery, for the prices to be higher.

Interesting graph, I like that it's broken out into quarters. But,

1) those are statistics for the old version, the new version might be completely different. I've had enough one-line fixes break entire features I was not aware of that my view is that any change invalidates all the tests. (Including the tests that Tesla should have but doesn't) Now probably a given update does not cause changes outside its local area, but I can't rely on that until it's been tested.

2) the self-driving is presumably preferentially enabled for highway driving, which I assume has fewer accidents per mile than city driving, so comparing FSD miles to all miles is probably not statistically valid.

I agree with you. I would really like to see datasets that reflect how things actually are. I think it would be really dangerous to jump to FSD being safe on the basis of the data I shared. However I would hope that whatever opinions people shared were congruent with the observed data. I don't feel like the prediction that Elon Musk and Tesla not caring about safety is congruent with the observed data, which shows the autopilot has improved safety, best explains the observations of improved safety.

Just for context - I've been in a self-driving vehicle. Anecdotally, someone slammed on the breaks. The car stopped for me, but I was shocked: for hours before this the traffic hadn't changed, it was a cross country trip. I think I would have probably gotten in an accident there. Also anecdotally, there are times where I felt the car was not driving properly. So I took over. I think it could have gotten into an accident. Basically, for me, the best explanation I have for the data I've seen right now is that human + self-driving is currently better than human and currently better than self-driving. The interesting thing about this explanation is how well it tracks with other times where we've technology like this before. In chess playing for example, there was a period before complete AI supremacy (which is what we have now) where human + AI was better than AI.

I like the idea of being safe, so if the evidence goes the other way, advocating for only humans or only AI doing the driving, I want to follow that evidence. Right now I think it shows the mixed strategy is best and that is kind of nice to me because it implies that the policy that best collects data to reduce future accidents through learning happens to be the policy that is currently being used. I like that.

As any Tesla supporter will tell you, Autopilot != FSD.

(Is Autopilot still limited to divided, limited access highways? Those are significantly safer than other roadways.)

> Is Autopilot still limited to divided, limited access highways

No. Was it ever? All you need is a piece of road that has something which appears to be lane lines. The road to my house is usable despite having no actual paint striping because it happens to have a crack that runs fairly straight up one side and was filled with tar. So the camera thinks it's a lane line. Ta-da!

This report is for Autopilot, not FSD which everyone else is talking about on HN.
Good point.

The thing is we often have discussions about this stuff and I'm trying to advocate for citing datasets to more tightly correlate our words with the evidence that our words correspond to. I'm not trying to say this version shouldn't have been recalled for example, but that I think we should be close to evidence.

In the case of auto-pilot, it was the case that people made the same arguments that are now being made against FSD. I think that makes it somewhat relevant to the discussion, because people previously also made the same claims about safety, but now that we have the data, we can see those claims were wrong. I believe these sort of generalizations, though inaccurate, can help us to make more informed decisions, but I'm not really confident in any beliefs that are made at this greater decision from direct data.

So I think anyone who can provide datasets which correspond with FSD performance rather than autopilot performance ought to do so. That would be really great data to reflect on.

The thing I'm worried about is that no data at all is backing the conjectures - which, given that I sometimes see estimates that I calculate to be many orders of magnitude away from data informed estimates - seems to be the case on Hacker News at least some of the time.

Please ignore all the times I'm wrong in favor of all the times I'm right!
I agree that people who don't cite the evidence are ignoring the evidence? Are you trying to say I'm doing that by pointing to relevant datasets which track the number of accidents and the probability of injury? If so, why are there accidents tracked in the datasets such that the rate can be calculated? This kind of contradicts the claim that I'm asking to ignore, but I definitely agree that other people are ignoring the data if that is what you are trying to say.
No, your argument is just ridiculous. The standard isn't and shouldn't be how much they get right. It should be what they get wrong and how they do that. I completely disagree with your point, and phrasing it obtusely just makes you obnoxious from a conversational standpoint.
> We have many real world example of videos where other cars would have killed somebody and the Tesla stopped based on image recognition.

I think you and I must've watched a different video.

Yes I have also seen many videos where it makes mistakes. But also many where it prevented them.
The person above you has no idea what they’re talking about. There’s literally hundreds of people at Tesla whose job is QA and tools to support QA
And how does that change anything about my statements?

Yeah, they have QA. But for the problem they claim they’re solving (robotaxis) and speed of pushing stuff to customers (on the order of days) it vastly, vastly insufficient. And it lacks any safety lifecycle process regards - again, just look at the timelines. Even if you’re super efficient, you cannot possibly claim you can even such a basic things like proper change management (no, commit message isn’t that) or validation.

> it lacks any safety lifecycle process

completely demonstrably false

> speed of pushing stuff to customers (on the order of days)

this is also false and doesn't happen

> you cannot possibly claim you can even such a basic things like proper change management (no, commit message isn’t that) or validation.

you know absolutely nothing about the internal timelines of developments and deployments at tesla and to suggest it's impossible without that knowledge is just dishonest

> > it lacks any safety lifecycle process > completely demonstrably false

Head of AP, testified under oath, that they don't know what's Operational Design Domain. I'll just leave it at that.

> > speed of pushing stuff to customers (on the order of days) > this is also false and doesn't happen

Never ever Musk tweeted about .1 fixing some critical issues coming in next few days? I must live in a different timeline.

> > you cannot possibly claim you can even such a basic things like proper change management (no, commit message isn’t that) or validation. > you know absolutely nothing about the internal timelines of developments and deployments at tesla and to suggest it's impossible without that knowledge is just dishonest

Let's assume I have no internal information. If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

> and speed of pushing stuff to customers (on the order of days)

well, if you don't get the software pushed to the QA team (the customers), how else are they going to get it tested?

can we please stop with this disinformation? the customers are not the QA team.
Andrej Karpathy was the AI lead for most of the project and he has talked about the general system design.

They have a set of regression tests they run on new code updates either by feeding in real world data and ensuring the code outputs the expected result, or running the code in simulation.

It does seem worrying that they would miss things like this.

Here’s a talk from Karpathy explaining the system in 2021:

https://youtu.be/aNVbp0WKYzY

Though I don’t recall if he explains the regression testing in this talk, there’s a few good ones on YouTube.

It's not even a bit surprising they'd miss things like this, IMHO. They do tests with a few (maybe even a lot of) intersections, but there are thousands upon thousands of intersections out there, including some where bushes are obscuring a stop sign, or the sign is at a funny angle, or sunlight is reflecting off the traffic lights, or heavy rain obscuring them, or plain old ambiguous signage...there's _bound_ to be mistakes. Human drivers make similar mistakes all the time.

I used to think that fact was going to delay self-driving cars by a decade or more, because of the potential bad press involved in AI-caused accidents, but then along comes Tesla and enables the damn thing as a beta. I mean...good for them, but I've always wondered if it was going to last.

I've been using it pretty consistently for a few months now (albeit with my foot near the brake at all times). I haven't experienced any of the above. Worst thing I've seen is the car slamming on the breaks on the freeway for...some reason? There was a pile-up in a tunnel caused by exactly that a month or so ago, so I've been careful not to use FSD when I'm being tailgated, or in dense traffic.

You know, there was an article on here last week about how there are only 4 billion floats, so just test them all.

There are only like 16 million intersections in the US. Why not test them all?

The thing is you already know everything you need to know about all 4 billion floats. Collecting data on every intersection in the US is quite difficult.

Tesla does however collect data on edge cases and then train their system to respond correctly. They can for example trail a collection network to identify things that might be obscured stop signs, then have the fleet collect a whole bunch of examples, hand label those samples, and roll this new data in to the training system. This is explicitly how they handle edge cases.

They can also create a new feature or network and roll it out in “shadow mode” where it is running but has no influence on the car, and then they can observe how these systems are behaving in the real world.

The real issue I guess is when they release a new feature without trialing it in shadow mode, or if they have gaps in their testing and validation system.

> Does anyone have insights on what QA looks like at Tesla for FSD work?

Yes. An army of Tesla owners perform the QA, in production.

Well first it goes to influencers that say it's perfect and good for stable release no matter what the car does!

But in all seriousness they do have some small team that validates then it goes to employees.

That's the thing about neural networks: any QA is going to be superficial due to their statistical black box nature.
Exactly. It's the same reason that no amount of unit tests can replace formal methods for safety-critical software, and we cannot apply formal methods to neural nets [yet].
That's not really true. Most safety critical software is tested without formal verification. They are just really really thorough and rigorous.

Formal verification is obviously better if you can do it. But it's still really really difficult, and plenty of software simply can't be formally verified. Even in hardware where the problem is a lot easier we've only recently got the technology to formally verify a lot of things, and plenty of things are still out of reach.

And even if you do formally verify some software it doesn't guarantee it is free of bugs.

What? Black box testing has plenty of techniques: https://en.wikipedia.org/wiki/Black-box_testing

Whether it's a neural network inside or not is completely irrelevant. That's why it's called "black box".

Practical neural networks operate in enormous parameter spaces that are impossible to meaningfully test for all possible adversarial inputs and degraded outputs. Your FSD could well recognize stop signs in your battery of tests but not when someone drew a squirrel on it with green sharpie.
Something a bit similar is clinical trial and it is accepted without problem.

You make a black box test on several thousands (sometimes only hundreds) patients, and if patients who received the drug perform better the patients who received the placebo, then the drug is usually accepted for commercialization.

Yet one isolated patient may be subject to several comorbidities, her environment could be weird, she could ingest other drugs (or coffee, OTC vitamins or even pomelo) without having declared it. In a recent past women were not part of clinical trials because being pregnant makes them very "non-standard'.

> Something a bit similar is clinical trial and it is accepted without problem.

Clinical trials also have strict ethical oversight and are opt-in. If clinical trials were like Teslas, we'd yeet drugs into mailboxes and see what happened.

First of all, clinical trials are typically longer and more thorough than you imagine, they span years. The fact that COVID vaccines were fast-tracked gives people wrong idea about it.

Secondly, even after the product hits the market the company is still responsible for tracking any possible adverse effects. They have a hotline where a patient or doctor can report it, and every single employee or contractor (including receptionists, cleaning staff, etc.) is taught to report such events through proper internal channels if they accidentaly learn about them.

> clinical trials are typically longer and more thorough than you imagine, they span years

I don't know where you get that, most clinical trials last 26 weeks, even in phase III.

and about "more thorough than you imagine" no, most are subcontracted to CROs and the way clinical trials are conducted is messy.

Below is story from the POV of a PI.

But similarly many patients complain about the way they are treated in visits and the lack of interest of the nurse/doctor who receive them.

https://milkyeggs.com/biology/why-are-clinical-trials-so-exp...

Your run of the mill computer program also "operates in enormous parameter spaces that are impossible to meaningfully test for all possible adversarial inputs and degraded outputs".
This is hardly similar as the state of a typical computer program can be meaningfully inspected, allowing both useful insights for adversarial test setups and designing comprehensive formal tests.
Right, if you consider the internal state, it is hardly similar. You talked about black box and QA though. Black box by definition holds the internal state as irrelevant, and QA mostly treats the software it tests as a black box, or in other words the tests are "superficial" as you call it.
This seems to ignore that if you look inside the box at code you could understand it whereas looking at the activation values is unlikely to illuminate.
As a human being and motor vehicle operator of many decades I have done all of the above, multiple times (very infrequently), both on purpose and on accident. I’m looking forward to the days when self-driving vehicles are normal, and human drivers are the exception. Until then, I’m glad companies and regulators are holding the robots to a higher standard than the meat computers.
> As a human being and motor vehicle operator of many decades I have done all of the above, multiple times

Time to stop driving. That is not normal

It also does not know what one way street, do not enter, road closed, and speed limit signs are. Really, the only signs it appears to know about are stop signs.

As for their QA process, in 2018 they had a braking distance problem on the Model 3. They learned of it, implemented a change that alters the safety critical operation of the brakes, then pushed it to production to all Model 3s without doing any rollout testing in less than a week [1]. So, their QA process is probably: compiles, run a few times on the nearby streets (I am pretty sure they do not own a test track as I have never seen a picture of tricked out Teslas doing testing runs at any of their facilities), ship it.

[1] https://www.consumerreports.org/car-safety/tesla-model-3-get...

Teslas have understood speed limit signs since 2020.[1]

1. https://finance.yahoo.com/news/upcoming-tesla-software-2020-...

It uses maps for that.

I have a winding road near me with a speed limit of 35 mph, but 15 mph on certain curves as indicated by a speed limit sign. It ignores those speed limit signs and will attempt to make the turns at 35 mph resulting in it wildly swerving into the other lane and around a blind turn with maybe 30 feet of visibility. It has also attempted to do it so poorly that it would have driven across the lane and then over the cliff without immediate intervention.

Unsupported claims by a manufacturer that compulsively lies about the capabilities of their products except when directly called on it are the opposite of compelling evidence.

I'm talking about standard speed limit signs. You're talking about the signs that warn about sharp turns and advise maximum speeds. Yes it would be good if the software understood those signs, but that's a different issue.

Teslas definitely read speed limit signs. I've had mine correctly detect and follow speed limits in areas without connectivity or map data. It also follows speed limits on private drives (if there is a sign) and obeys temporary speed limit signs that have been put up in construction zones.

So they read some, but not all speed limit signs, and especially not the really important ones that inform you that you will be going dangerously fast if you do not read and follow them. That is criminally unacceptable.
Can you name any car manufacturer that has software to read those signs?
>I'm talking about standard speed limit signs. You're talking about the signs that warn about sharp turns and advise maximum speeds.

Cognitive dissonance at it's finest.

These are not the speed limit signs you are looking for!
Not according the recall the NHTSA posted that is the subject of this entire thread....
There are many of such tests in the open.

There is even a former Tesla AI engineer that throws objects in front of the car on YouTube, as a demonstration.

The results are not glorious at all :| (trying to find the channel back if someone knows).

And random public tests too: https://www.youtube.com/watch?v=3mnG_Gbxf_w

This is a basic safety auto-braking. Just feels very wrong to even accept it goes into release.

This is not a former Tesla engineer. This is competitor who wants to discredit Tesla and sell its own solution.

The guy behind this is known to be untrustworthy, and many of the videos don't actually do what he claims. Notably he refused to release the videos that would prove his claims right.

The reality is that Tesla scores high on all the automated breaking test done by government. The driver however can override this, and that is exactly what is being done in this video.

So they scored high on whatever version of software was on the specific car tested by government at some point in time. Has any government done any testing at all on the version of software actually in use today?
The automated breaking is standard part of the software on every Tesla and is the software that was tested by multiple governments. This gives you certification. If Tesla would change or remove that software, it would be highly illegal. Tesla is actually notable that they are one of the only company that has implemented the highest standard of this safety feature on every car. These features are something Tesla is proud of have talked about quite a bit. They also mention that they have achieved the highest scores these evaluations.

The test done above has been replicated and the car does break in automated driving. And it gives loud warning to the driver even under normal driving operation and does emergency breaking.

Tesla assumes that if the driver hits the accelerator after the warning, the drive wants to accelerate based on the drivers judgment.

This is what this video shows, notably the person that made this video has refused to release prove that the car actually was in Autopilot, refused to provide evidence that the driver didn't hit the accelerator and refused to provide audio from inside the car so it can be verified that there was no warning sound.

In addition to that, the same person also claims things like 'millions of people will die if Autopilot isn't stopped' and that even under absurd assumptions is a legit insane thing to say.

So what is more likely, that Tesla did something incredibly illegal removing and incredibly important safety critical software that is standard in every Tesla OR that a competitor is doing a PR campaign where they deliberately set up a situation to film a video where they can create maximum damage to Tesla and then advocate for their own solution.

https://www.tesla.com/blog/model-y-earns-5-star-safety-ratin...

So the question is who do you believe, Euro NCAP or a competitor who made a sensationalist viral video.

You are probably thinking of: https://www.youtube.com/@AIAddict
I suspect they have thousands of tests, but ship code that passes only most of the tests...

That's what makes it unfinished...

It's never passed the 'drive from new York to LA with nobody touching the controls' test...