Hacker News new | ask | show | jobs
by recursivecaveat 93 days ago
I definitely agree that in principle a computer can drive with cameras alone. I don't know whether it's a useful statement. Like a human can determine the genre of a movie merely by watching it. I wouldn't suggest to blockbuster in 1990 that they should collect no genre metadata for movies because the database server should automatically sort it out on its own. (Nowadays somewhat feasible with ML of course, but 20+ years later.) What sensors/data you need is a question of where computers are now or will shortly be, and it seems that for now they need the extra structure of LIDAR for best effectiveness.
1 comments

>I definitely agree that in principle a computer can drive with cameras alone.

Obvious things first, cameras have way worse contrast and low light sensitivity than human eyes.

Humans have much more evolved logical thinking capacity, even the stupid ones can figure stuff out that modern AI struggles with.

Humans have other sensors, too that they use to plausibility check the picture they see. I.e. one of the best sensor fusion systems on the planet.

When in doubt humans can figure out whether it's a lens occlusion or a some other artifact in their vision by virtue of moving their head around.

There's probably other things I'm not thinking of. In any case to make full self driving work we should first start by using all available tech to make it safe. When you have safe tech you can slowly start removing individual sensors while verifying that safety remains high. As the experience and system evolves there will be optimization potential.

And until we have that low light thing and high contrast figured out, camera alone doesn't cut it.

Unrelated to FSD, what's a good example where frontier AI struggles with logical thinking that even stupid humans can figure out?

I personally feel like that isn't really true any more.

The recent one was should I drive my car to the car wash if it's only 300 feet from my house although it wasn't a slam dunk.
Right, but if these things are so rare that we all only know the one viral example, I feel like that lends credence to the models basically generally not having this problem.

Researchers built the Winnograd Schema Challenge more than a decade ago to assess common sense reasoning, and LLMs beat that challenge task around GPT 4.

They're not so rare. Hallucinations have been spotted everywhere, but the "driving a car to the car wash" is an amusing one that's been recently publicised. Developers aren't going to point out every time an LLM hallucinates an entire library.
I'd add to this, any moderately involved logical or numerical problem causes hallucinations for me on all frontier models.

If you ask them in isolation they may write a script to solve it "properly", but I guess this is because they added enough of these to the training set. But this workaround doesn't scale.

As soon as I give the LLM a proper problem and a small part of it requires numeric reasoning, it almost always hallucinates something and doesn't solve it with a script.

If the logic/math is part of a larger problem the miss rate is near 100%.

LLMs have massive amounts of knowledge, encoded in verbal intelligence, but their logic intelligence is well below even average human intelligence.

If you look at how they work (tokenization and embeddings) it's clear that transformers will not solve the issue. The escape hatches only work very unreliably.

If you ask this of any current day AI it will answer exactly how you would expect. Telling you to drive, and acknowledging the comedic nature of the question.
That's because AI labs keep stamping out the widely known failures. I assume without actually retraining the main model, but with some small classifier that detects the known meme questions and injects correct answer in the context.

But try asking your favorite LLM what happens if you're holding a pen with two hands (one at each end) and let go of one end.