| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yummypaint 137 days ago

By leveraging Genie’s immense world knowledge, it can simulate exceedingly rare events—from a tornado to a casual encounter with an elephant—that are almost impossible to capture at scale in reality. The model’s architecture offers high controllability, allowing our engineers to modify simulations with simple language prompts, driving inputs, and scene layouts. Notably, the Waymo World Model generates high-fidelity, multi-sensor outputs that include both camera and lidar data.

How do you know the generated outputs are correct? Especially for unusual circumstances?

Say the scenario is a patch of road is densely covered with 5 mm ball bearings. I'm sure the model will happily spit out numbers, but are they reasonable? How do we know they are reasonable? Even if the prediction is ok, how do we fundamentally know that the prediction for 4 mm ball bearings won't be completely wrong?

There seems to be a lot of critical information missing.

9 comments

IMTDb 137 days ago

The idea is that, over time, the quality and accuracy of world-model outputs will improve. That, in turn, lets autonomous driving systems train on a large amount of “realistic enough” synthetic data.

For example, we know from experience that Waymo is currently good enough to drive in San Francisco. We don’t yet trust it in more complex environments like dense European cities or Southeast Asian “hell roads.” Running the stack against world models can give a big head start in understanding what works, and which situations are harder, without putting any humans in harm’s way.

We don’t need perfect accuracy from the world model to get real value. And, as usual, the more we use and validate these models, the more we can improve them; creating a virtuous cycle.

tantalor 137 days ago

It's a pareto principal.

You can get 80% of the way to "perfect" with 20% of the effort.

dyauspitr 137 days ago

That’s just a platitude at this point. They for all intents and purposes solved the problem, atleast in the US.

jayd16 137 days ago

I don't think you say "ok now the car is ball bearing proof."

Think of it more like unit tests. "In this synthetic scenario does the car stop as expected, does it continue as expected." You might hit some false negatives but there isn't a downside to that.

If it turns out your model has a blind spot for albino cows in a snow storm eating marshmallows, you might be able to catch that synthetically and spend some extra effort to prevent it.

hnburnsy 137 days ago

Looks like they need to blackouts and parades to that simulator...

https://www.yahoo.com/news/articles/waymo-paralyzed-parade-b...

disillusioned 136 days ago

The blackouts circumstance was because they escalate blinking/out of service traffic lights to a human confirmed decision, and they experienced a bottleneck spike in those requests for how little they were staffed. The Waymo itself was fine and was prepared to make the correct decision, it just needed a human in the loop.

In the video from the parade... there's just... people in the road. Like, a lot of small children and actual people on this tiny, super narrow bridge. I think that erring on the side of "don't think you can make it but accidentally drag a small child instead" is probably the right call, though admittedly, these cases are a bit wonky.

sznio 136 days ago

>The blackouts circumstance was because they escalate blinking/out of service traffic lights to a human confirmed decision

Which isn't really a scalable solution. In my city the majority of streetlights switch to blinking yellow at night, with priority/yield signs instead. I can't imagine a human having to approve 10 of these on any route.

xnx 135 days ago

From their blog post they give the sense that they had the human review "just to be safe", but didn't anticipate this scenario. They've probably adjusted that manual review rule and will let the cars do what they would've done anyway without waiting for manual review/approval.

joshfee 137 days ago

Isn't that true for any scenario previously unencountered, whether it is a digital simulation or a human? We can't optimize for the best possible outcome in reality (since we can't predict the future), but we can optimize for making the best decisions given our knowledge of the world (even if it is imperfect).

In other words it is a gradient from "my current prediction" to "best prediction given my imperfect knowledge" to "best prediction with perfect knowledge", and you can improve the outcome by shrinking the gap between 1&2 or shrinking the gap between 2&3 (or both)

notatoad 137 days ago

seems like the obvious answer to that is you cover a patch of road with 5mm ball bearings, and send a waymo to drive across it. if the ball bearings behave the way the simulation says they would, and the car behaves the way the simulation said it would, then you've validated your simulation.

do that for enough different scenarios, and if the model is consistently accurate across every scenario you validate, then you can start believing that it will also be accurate for the scenarios you haven't (and can't) validate.

fooker 137 days ago

> from a tornado to a casual encounter with an elephant

A sims style game with this technology will be pretty nice!

ses1984 137 days ago

You could train it in simulation and then test it in reality.

inkysigma 137 days ago

Would it actually be a good idea to operate a car near an active tornado?

klysm 137 days ago

It’s autonomous!

kylehotchkiss 134 days ago

Kinda yeah, they tend to always travel northeast

bharrison 137 days ago

The tornado?

gokuldas011011 136 days ago

ML models doesn't have fight or flight, so we'll have to show them tornado and teach to run away.

YeGoblynQueenne 136 days ago

>> How do you know the generated outputs are correct? Especially for unusual circumstances?

You know the outputs are correct because the models have many billions of parameters and were trained on many years of video on many hectares of server farms. Of course they'll generate correct outputs!

I mean that's literally the justification. There aren't even any benchmarks that you can beat with video generation, not even any bollocks ones like for LLMs.

aaaalone 137 days ago

They probably just look at the results of the generation.

I mean would I like a in-depth tour of this? Yes.

But it's a marketing blog article, what do you expect?

parliament32 137 days ago

> just look at the results of the generation

And? The entire hallucination problem with text generators is "plausible sounding yet incorrect", so how does a human eyeballing it help at all?

inkysigma 137 days ago

I think because here there's no single correct answer that the model is allowed to be fuzzier. You still mix in real training data and maybe more physics based simulation of course but it does seem acceptable that you synthesize extremely tail evaluations since there isn't really a "better" way by definition and you can evaluate the end driving behavior after training.

You can also probably still use it for some kinds of evaluation as well since you can detect if two point clouds intersect presumably.

In much a similar way that LLMs are not perfect at translation but are widely used anyway for NMT.

aaaalone 136 days ago

You should be able to see if it is generated wrong after you see a car driving in it.

I can spot Halluzination in LLM too