Hacker News new | ask | show | jobs
by findthewords 335 days ago
As a control engineer who knows something about sensor fusion - No LIDAR no ride. Musk can brute-force his unsafe robotaxis on the road but they won't be as safe as Waymo. Maybe people won't care if the price is slightly cheaper vs. competition, I don't know.
8 comments

The non-technical answer is that people do not care so long as it's safer and less unpleasant than the current alternative.

My last taxi ride involved jumping red lights and speeding through residential streets at 60mph because, I assume, it was early morning and the driver had learned from experience that he could get away with driving like this.

The previous experience to this was a lecture about a certain religious ideology and how I should spend the next two weeks reading up about it.

Remember that the robotaxi industry is currently doing everything they can to look good. They are being watched like hawks. Look at how everyone was horny for Uber rides, until Uber needed to actually make money so they jacked up fares, cut driver pay and let anyone with a piece of shit car join. Or how Google search has been enshittified.

In 10 years when robotaxi companies are short on cash and trying to IPO they will absolutely start speeding. They'll lock the doors and give you a paid presentation about Scientology during your ride. "Accidentally" drive you to a competing store that paid for sponsored traffic, instead of the .

Then customers choose a self-driving taxi which doesn't try to kill them. People aren't forced to take the murder taxis.
>In 10 years when robotaxi companies are short on cash and trying to IPO they will absolutely start speeding

Openly commit crimes when everything's being recorded and subject to discovery? What could possibly go wrong?

>They'll lock the doors and give you a paid presentation about Scientology during your ride.

Sounds like a good excuse for whoever's trapped to break open the side windows because he "felt he was in danger because of claustrophobia" or whatever. More seriously though, I don't see anything wrong with mandatory ads as long as it's disclosed ahead of time.

> "Accidentally" drive you to a competing store that paid for sponsored traffic, instead of the .

By your own admission "Google search has been enshittified", yet when was the last time google "accidentally" sent you to the competitor's site on the search results page?

Uh, every Google search already shows me sponsored links to competitors’ sites at the top of the results.

What would you say if in 5 years with Waymo, you want to go to Burger King but before taking you there, it suggests ‘hey, why don’t you go to this McDonald’s instead?’ At first you’d get a discount on your ride if you accept the suggestion, and then once everyone gets used to it, no more discounts and everyone ends up paying more than human taxis cost today.

Over the years, I’ve had four human driven taxis crash, and two taxi drivers rob me.
Never heard of those two occurrences happening for a taxi customer, ever.

Perhaps you live in a location that lends itself to such events...

Before there was an alternative used to take taxis in Toronto occasionally, and the common refrain was that the card machine was broken. And sometimes no change was available. These kinds of soft scams were common.

So it’s not a hold-up, but definitely a form of robbery.

Crashes: New York, Istanbul, SW UK, Cairo.

Robbed: Bishkek, Astrakhan. Former nicked my SLR at a gas stop, I should have been more attentive, but wasn’t expecting him to loot my luggage. Latter delivered me to his buddies who threatened violence unless I turfed over every penny I had. Joke was on them as they thought I was a loaded oil exec when I was actually just a broke backpacker.

I’ve also had the old shake-down fare in Ljubljana, Bucharest, and Riga, off the top of my head - but I don’t count that as robbery, just assholes.

Can I ask what time periods?

I have no personal experience of Kyrgyzstan or Russia, but my hunch would be that the noughts were riddled with taxi drivers like you say, while that has slightly improved over time? I mean, perestroika is known for having those problems, wasn't it? Correct me if I am wrong please anybody, thank you.

Also, kudos on your travel.

Crashes, various points over the last few decades. Three fender benders but the guy in Istanbul managed a proper one at speed, early hours of the morning and going too fast in the rain - thankfully nobody hurt but his car was trashed.

Robberies, both 2012, on the same trip.

And yes, corruption still rules the roost in a lot of old eastern bloc countries, from the government to the cops to the babushka driving the bus - it will take a long while for the mindset that the USSR inculcated to dissipate - that is, that the only way to get ahead in life is to lie, cheat, and steal. 2012 was my first time east of the Urals, and these days I know far better how to deal with it than back then, when I was still wet behind the ears.

If that happened to me in America I would count it as robbery.
Chicago taxis were pretty bad. Had taxis driven by the guy not in the card. Broken seat belts. Mega speeding. It was terrible.
I was in a Chicago taxi where the driver was basically passing out but (I think) chewing qat to keep awake
Well, you can't say that again.
Removing LIDAR was the eye-opening moment for me re: Musk. The justification given at the time was technically bogus while it was so obviously in response to supply issues during COVID. If Tesla really wanted safe FSD they would prioritize sensor quality and multimodal input. They went for the bottom line instead.
Tesla's never had LIDAR to remove. They had short distance ultrasonic parking sensors that they removed.
They also had and then removed radar.
Thank you, I think I’m conflating the radar that was present with his staunch anti-LIDAR stance. You can s/LIDAR/radar/ in my original comment.
Not true, they had lidar in at least one model. It's written in his bio too.
They had lidar on some models they were using for internal testing, but they never sold them.
No, they were production models.
And now he's peddling more horseshit and expecting the public and investors to swallow it:

> One analyst asked about the reliability of Tesla’s cameras when confronting sun glare, fog, or dust. Musk claimed that the company’s vision system bypasses image processing and instead uses direct photon counting to account for “noise” like glare or dust.

I call bullshit. Photon counting requires specialized cameras that are simply not present on Teslas. Not to mention lab conditions (so you can direct photons at your sensor, versus you know, scattering into the atmosphere...) And that don't do anywhere near as well at regular image processing, for that reason.

But Musk thinks you're not smart enough to know this.

So much of the stuff he's said about rockets is designed the same way, to sound smart to people who don't know any better.

Surely photon detection wouldn't be cheaper than LIDAR, given it's impossible to mount to a car lol

The innovative stuff SpaceX has done is actually smart. I can't speak to anything else that Elon is working on, but he is actually a competent rocket engineer.
Many LIDARs are blinded by direct Sunlight, and often see highly reflective surfaces like mirrors as black hole or distant ranged areas. These still need tertiary safety sensors like mm RADAR for safety in dust/rain/sunlight.

In cities, high-speed rail and e-bikes make more sense than a honking traffic jam at 4am. lol =3

The part that I always found difficult to square was not that Elon Musk towed this “no need for LiDAR” line so hard, but that Andrej Karpathy, who I generally consider a very reliable voice in this space, was also in strong agreement that cameras were all that was needed. Does anyone know if he still believes cameras is all you need?

Edit: Here is a link to Karpathy discussing the trade-offs: https://www.youtube.com/watch?v=cdiD-9MMpb0&t=5276s

> but that Andrej Karpathy, who I generally consider a very reliable voice in this space, was also in strong agreement that cameras were all that was needed.

You really expect a Tesla employee to speak out against Elon?

Especially when $10M+ TC is on the line?

But Andrej is no longer with Tesla?
Andrej "train it on more data and the problem will go away" Karpathy
Right? Im not arguing against the skills he obviously has, but if we're always just piece wise approximating the underlying manifold, then there will always be new problems. The amount of data required to reliably approxate reality, in the absence of an inductive bias, is infeasible to expect to collect. Not to mention how computationally inefficient it becomes as your model blows out in size/complexity.
I guess the gamble was that there would be a certain point where the edge cases disappeared into the noise and it didn't matter that the approximation was/is "wrong" because the behavior of the car would match the requirements of the situation even if it wasn't for the right reasons.

To be fair I remember reading about GPT2 and thinking that LLMs would blow out for similar reasons.

Which they more or less have. Larger models are seeing negligible returns. It just turned out that scaling would hold out just enough longer to make LLMs generally useful.
Isn't it ultimately a cost trade off? I mean I can't see a valid argument against LiDAR and cameras if the cost of the vehicle is no concern.

If building a mass market product though the cost is a big deal.

I would assume LiDAR is much more expensive so it would be a big win to get the same performance out of cameras in the long run. I have always just assumed that was the bet.

I'm curious to know how you would act as a driver, cyclist, or pedestrian if you saw a Robotaxi nearby. Would you be more cautious, and if so in what way? Or are you mostly worried about LIDAR-less vehicles running into white walls, which it can't identify as solid walls?
I already stay tf away from Teslas if i can help it, I'm taking my bike to a different street if there's a driverless one
can binocular cameras replace lidar you think? they should result in just as reliable distance estimation
No, they don't. Look at what has happened when a tesla has mistaken a motorcycle with two small rear lights that is nearby for a car that is further away but with the same lighting configuration. Did not end well for the motorcyclists.

He's just wrong about this.

I dont think it's wrong, but i do think models avaliable right now lack the inductive bias required to solve the task appropriately, and have architectural misalignments with the task at hand that mean for a properly reliable output you'll need impossibly large models and impossibly large/varied datasets. Same goes for transformers for language modelling. Extremely adaptable model, but ultimately not aligned with the task of understanding and learning language, so we need enormous piles of data and huge models to get decent output.
Can cameras do it eventually is a bit of a tangent.

All the information is there in a video feed, but the amount of work to get reliable perception from it is not small. With LIDAR and radar you get to the end goal with less uncertainty.

The real key is that things like LIDAR are designed to work well with the types of tasks computers are good at, like taking a bunch of precise measurements every second and performing complex calculations, while a binocular vision based understanding of the world is something humans are good at because we evolved that ability over millions of years.

You can probably eventually ("never" is a long time after all) get a computer to understand the world as well as a human purely through camera based sensors, but it's a much more difficult task than taking an approach that uses tools computers are already good at. Similarly, I suspect it would be an uphill battle to have a human drive using raw LIDAR input.

I think you underestimate how many guesses, approximations, and filling in your brain does to what you think you're seeing.
Absolutely. The goal of self-driving is to be better than human drivers. Even the best drivers struggle with the sun shining from low angles, or road reflections, or snow, and so on.
They are complementary sensors. It's a much easier engineering feat to combine two (cheap) sensors that are good at different things and fusing this information than creating one perfect sensor that does everything.

Private moon landers (the Japanese being most recent one) keep crashing because they rely on a single high-quality altimeter and expect it to work perfectly, all the time. If they had a complementary low-quality backup altimeter that operated independently, they would have had a less failure prone distance estimation system.

Ever heard of optical illusions? If a brain can be fooled by input from its two cameras like this, what hope does a dumb (or worse, artificially “intelligent”) computer have?
I think optical illusions are a poor choice to illustrate this point. They are manifestations of the corner cases, peculiarities, and side effects of our visual processing system and neither cameras nor Lidar are without their own analogous issues.
If you’re saying cameras have analogous issues I fail to see how the analogy is a poor choice - looks to me like you understood exactly the point I was making.
Would Musk's argument about sensor fusion have made any sense back when they were doing lots of hand-coded C++?

I've been thinking maybe vision-only was a reasonable decision, back when lidar was expensive and the software was hand-coded. Now it doesn't, because lidar is cheaper and the software is and end-to-end neural net, and additional sensors are just more inputs to the network which will learn to use them. But Tesla is locked in because of the promises they made to early FSD buyers.

Sensor fusion is pretty straightforward. You can think of it like sorting algorithms in CS. There's a bunch of standard techniques simple enough to teach undergrads that work fine in production, and enough technical depth beyond them to last the rest of your career.

If you actually look back at the E2E tweet, musk only says that the NN replaced 300k lines of "control code". Control code usually doesn't encompass the entire AV software stack, but neither should it take 300k LOC. As far as I'm aware no one is 100% sure what they mean by E2E and if it's actually the standard meaning or something else that's been widely misinterpreted.

Their engineers, who obviously are on the bleeding edge, going out of their way to avoid sensor fusion issues says something quite different. I could believe "Straightforward" could maybe apply for something many tiers below in complexity and safety requirements to what they're doing. But adding non-agreeing, non-uniform information sources to the most capable real-world ML vision system not driven by human-engineered code?

I don't care even if you said you had 70% of the experience their team has, what you say can't sound reasonable or caring for actually improving safety in numbers.

I've been through 3 public AV launches. I don't lead with that because "resume" measuring contests are boring and what I write should be evaluated on its own merit, not by who says it. This account is readily identifiable to anyone who knows me.

With that out of the way, it's much easier to write a hardware safety case than one for software. It's easier to write a software safety case for a traditional software architecture than one based on ML. It's much easier to write a safety case for focused models than an E2E system. None of these should be controversial statements.

You're arguing that Tesla is deliberately jumping to the hardest safety case in order to avoid the simpler safety case. I know many people who have worked on the relevant teams at Tesla. I don't think they're ignorant of the difficulty of Tesla's choices or making decisions based on what's easier to validate.

Sensor fusion is not the difficult part. You only have to look at the fact that virtually every other company in the industry (even the ones using E2E and not using LIDAR like Wayve) does it. Either we have to accept that everyone else is stupid, or there are other factors involved in Tesla's decision-making.

For what it's worth, I'm not sure if I could write a safety case about Tesla's system that meets my own personal standards. It's pretty clear from their actions and the various regulatory/legal inquiries that they don't have what I consider an effective safety process regardless.

They go out of their way to avoid sensor fusions because they are not allowed to use other sensors. Some people seem to have no limits to their credulity.
Yes. They are disallowing themselves from using other sensors. To avoid complicating the software stack and ultimately to keep it higher performing, which will make it safer in any alotted time.

This is really no different from good code practices with a complex system. Most HN readers should be familiar with rot in codebases quite comparable to "adding a few extra features to make it better" which just became a maintenance burden and take away from core features.

If you knew the slightest of how Musk has actually stayed exactly the same since the beginnings of Tesla, you'd know his hard specs are always technology-based. People that know more have said that his capability of speccing systems relatively right deep in to the future is possibly his single greatest leadership feat.

>Once Musk is near-certain about one technology pathway over another, he’s not afraid to put massive amounts of resources into that path, while still staying flexible enough in the case that a new emerging technology disrupts that particular path. Because he’s willing to make enormous (and seemingly risky) bets on these pathways, he’s able to outpace his competitors. https://www.quora.com/Is-Elon-Musk-all-that-hes-cracked-up-t...

Meanwhile, pretty much everybody else doing safety-critical real-time control prefers multiple sensor types.

For an example of that in self-driving, Huawei uses vision, lidar, radar, and ultrasound. Out of Spec let it drive them around for an hour in busy traffic in a city in China. It looked about as good as FSD, without having had several million cars providing training data for years.

People really act like SpaceX didn't rain fifteen tons of concrete onto employee's cars because musk decided he wanted to be the first person in history to launch a heavy rocket without a flame diverter, and none of the brilliant engineers were able to convince him otherwise or had the guts to otherwise scuttle the launch for the sake of safety.

Clearly Tesla engineers will do the same

> and the software is and end-to-end neural net,

Well considering Musk record for adherence to reality, question is if you can really believe that or if Musk thinks that this is happening or it is not happening at all and Musk is just making it up.

Given what's happening in the rest of the AI field, it seems pretty likely that it's correct.

Plus various Tesla engineers have said the same thing, Tesla does have a very large AI training cluster, and FSD quality made a big jump when they claimed to deploy end-to-end.

You've got it exactly backwards.

When the software was hand-coded, having Lidar and a high-res map was vital.

But if you have a good enough AI, sensors that replicate the human senses are all what is needed.

The real question is: when will we get good enough AI that can be applied to all cars?

The question isn't whether it can be done eventually, but whether it would get safer faster with extra sensors.

Back in the day, Elon's specific objection to lidar was that it was too hard to code the sensor fusion part. With end-to-end there's no coding to deal with.

And it's not like they've replicated the visual cortex. It's the same neural net technology everybody else is using. It can deal with any sort of sensors just fine.

> But if you have a good enough AI, sensors that replicate the human senses are all what is needed.

What is your example or evidence for this?

Concentrated and well-rested drivers with good eyesight not under the influence are able to drive very well.
So human senses are all we need and we just have to solve artificial general intelligence and fully simulate a human brain? I wonder why no one has done that yet.
Yes, but we're a long way from putting AI hardware in cars that's as powerful as the human brain.

Plus our cameras aren't as good as human eyes.

It's not "Musk's argument," it's Andrej Karpathy's.

Also, if you've ever done any ML you would note that more data isn't always better. Plus there's the piece about which thing to believe when you get conflicting data. It's a lot more to it than what random Hacker News people are saying in this thread.

Conflicting data is an issue even when you just have lidar, and it's not hard to deal with.

The cofounder of Waymo taught one of the first Udacity courses on this subject. He went through a small Python project that processed lidar point clouds for self-driving. The data is noisy, you get conflicting information from different points, and the code aggregates all that into the most likely 3D model of the world.

Additional sensor inputs are just more of the same, and neural nets are pretty good at this sort of thing. They'd even learn which sensors are more reliable in different scenarios.

As for "more data isn't always better," I've mostly seen that applied to training, not inference in real-time control systems. Even for training, it turned out people had been fooled by a local maximum, and once past that, more data really was better.

I have a question:

- What are the main challenges in building software that relies solely on camera input?

- Which specific modules or tasks still require LiDAR to function reliably?

Camera vision and LIDAR perfectly complement each other. Camera vision is no good detecting unknown/outlier obstacles quickly and accurately. LIDAR is great at detecting unknown obstacles quickly and accurately.

You can tune the camera obstacle detection to be hyper-sensitive, which results in phantom braking, causing Passengers to feel that the car is "unreliable" while it actually is safer. Humans are better at braking the appropriate amount when they see something strange, dynamically tuning their sensitivity in a new situation.

You can lax the sensitivity, which will reduce false alarms, but will actually cause more crashes, deaths, and injuries. You don't want your customers to feel unsafe, so from a business perspective you will inevitably reduce the sensitivity.

> What are the main challenges in building software that relies solely on camera input?

Probably the main challenge is that it took nature about a billion years to get to human level visual perception and understanding of environment and nobody really knows how to duplicate it.

Using tools like LIDAR can fill some gaps.

What version of Lidar does your head have?
Counterquestions

- What living creature is using wheels to move around?

- What kind of birds come strapped with a jet engine?

Sometimes non-natural solutions are easier and often better than attempt to replicate nature for every cost. Imagine your logic applied on a plane - birds flap their wings, thus this 737 should spread it wings and flap away like a goose. Now take military goose and flap fast enough to get supersonic...

> - What living creature is using wheels to move around?

A human driving a car.

I agree with you that sometimes non-natural situations are easier and can absolutely be better, but the point of bringing up humans is generally to show that it demonstrably is possible to do at least as well as humans, with humans as an existence proof.

But that it might well take too long and cost too much to get there, and that it might well in the end be cheaper and better to use additional types of sensors is a good point.

There's no reason to think the best performing, or even adequately performing, technological solution to a problem would mirror how humans have solved it. Submarines don't swim like fish after all.

But more specifically to this case, human eyes are attached to brains with (generally) vastly better image recognition and reasoning abilities than any camera based self-driving car. Because of this, humans are better able to recognize visual input even in degraded or unusual conditions compared with a computer.

That's a bit like saying cars should use legs instead of wheels.

(Also, I don't know what star-system you grew up on by my lidar sensors are next to the tubular sheaths on my cephalothorax, right where Xoc'tlz'ik (the Creator) intended them to go.)

My head contains intelligence hardware that far outstrips anything in these cars. Plus my cameras are a lot better.

Despite all this, a lot of modern cars are adding lidar to make things easier for me.

Your "cameras" have about 20 MP of active resolution between the two of them. With dead zones and pixels spread out unevenly. A modern smartphone has you beat.

There's a small, sharp, high resolution color-enabled area in each eye - but the bulk of your vision field is monochrome, and mostly sensitive to motion.

You don't notice that, because your image data is stacked and post-processed to shit to make it presentable. Your brain has been doing computational photography before it was cool - 90% of what you see at any moment in time is effectively AI-generated.

Your head has a general intelligence, so it can do sensor fusion in a way a car can only dream of.
It's not that we lack lidar, but we what we have in addition to "cameras".

We possess a spatial intelligence (e.g. how your brain has an approximation of: It feels like I walked three blocks) that will never exist in this "photons-in" "controls-out" fantasy.

There's a solid argument to be made that solving the self driving problem using only cameras might end up being roughly equivalent to solving the AGI problem because you will have essentially created a computer with a human understanding of the world around it using human senses.
I have thought that Tesla will be forced to learn many things from the only camera approach.

And they may well end up with something "that works". But... many buts.

What? An FSD Tesla has its very own "world model". It doesn't try to reconstruct a world "photons in", from scratch, 60 times per second. It continuously updates and refines the data it already has based on the sensor inputs, and then uses this internal representation to make driving decisions.

This "world model" is what you get to peek into through the car's screen. By now, it even has basic "object permanence". Nowhere near as good as a human yet. But AI is getting better, and an average driver isn't.

No amount of LIDAR wankery can solve self-driving.

Take any self-driving car crash where the self-driving car was found at fault. Dump the blackbox, extract the raw sensor data. What will you see?

You'll see that the car had all the sensory data it needed to make the right call, many times over. And it didn't make the right call. That's not a "sensors" problem. The sensors are good enough. The main bottleneck for self-driving is, and always was, in AI.

Which is why you get things like that Cruise car dragging a pedestrian despite being equipped with 360 cameras and a total of 5 overlapping LIDARs. It had the sensors. What it didn't have was object permanence.

Well an under-carriage camera would have obviated this failure mode.

So you could define it as a sensor issue

How many car crashes does waymo have per mile compared to average drivers?
Statistically, SOTA self-driving cars are already superhuman. That holds for Waymo and Tesla both. They crash less, like for like, and the incidents they get into are less severe. But that's not because self-driving cars outperform a "top of the line" human driver. It's because they outperform the absolute worst bottom of the barrel human driver.

A big part of a self-driving car's "safety edge" is that it isn't going to go 80 in a 40, doesn't fall asleep at the wheel, and isn't capable of DUI.

Self-driving cars still struggle in some situations most human drivers wouldn't find challenging - AI issues - but they don't make the worst, the most unforced and avoidable "human factor" mistakes.