To be fair it is far more complex for a robot to grip a spatula and use that spatula on a griddle than to use dynamic motion to flip a pancake in a pan.
Solving any one problem with robotic manipulation isn’t all that hard. It takes a lot of trial and error, but in general if the task is constrained you can solve it reliably. The trick is to solve *new* tasks without resorting to all that fine tuning every time. Which is what Russ is claiming here. He’s training an LLM with a corpus of one-off policies for solving specific manipulation tasks, and claiming to get robust ad hoc policies from it for previously unsolved tasks.
If this actually works, it’s pretty important. But that’s the core claim: that he can solve ad hoc tasks without training or hand tuning.
> He’s training an LLM with a corpus of one-off policies for solving specific manipulation tasks, and claiming to get robust ad hoc policies from it for previously unsolved tasks.
It seems clear that many people do not understand that this is the key breakthrough: solving arbitrary tasks after learning previous, unrelated tasks.
In my opinion that really is a good definition of intelligence, and puts this technique at the forefront of machine intelligence.
Flipping a pancake in a "random kitchen" would be much more difficult and have many of the same issues as the door problem.
It's hard to point to a single thing that would make "flipping pancakes" intractable, it's sort of the other way around, to usefully flip pancakes in the same way as a person takes a lot of skills chained together.
The "door problem" is a sort of compendium of many real-world skills, identifying the door, understanding its affordances and how to grip / manipulate them, whether to push or pull the door, predicting the trajectory of the door when opened, estimating the mass of the door and applying the right amount of force, understanding if there any springs or pulls on the door and how it must be held to traverse through it. Etc. There are also a ton of things I'm missing that are so fundamental one tends to take them for granted, like knowing your own size and that you can't fit through a tiny doorway.
I think you can ramp towards the "door problem" in difficulty by slowly relaxing constraints. A video linked above (not article) shows "can flip a pancake successfully with a particular pan (you are already holding) and pancake with a fixed camera and visual markers". Ok, now do it in varying lighting conditions. With no visual markers. With different camera views. Different pancakes. Real pancakes (which are not rigid, and sometimes stick to the pan). Different pans. Now you have to pick up the pan. Use a stove. Different stoves. Identify griddle vs pan and use the right flipping technique. Find everything and do it all in a messy kitchen... eventually you're getting to same ballpark as the "door problem".
physicist here (so very naive on these topics) - I’m wondering how to compare the steps you mention regarding the door problem (especially the predictive ones, e.g. about the trajectory of the door as it opens, etc) with how humans open doors? Surely people don’t stop in front of a door and begin planning things out, rather they seem to go for it and adjust on the fly, is this an approach that won’t work in robotics? Why not?
Solving any one problem with robotic manipulation isn’t all that hard. It takes a lot of trial and error, but in general if the task is constrained you can solve it reliably. The trick is to solve *new* tasks without resorting to all that fine tuning every time. Which is what Russ is claiming here. He’s training an LLM with a corpus of one-off policies for solving specific manipulation tasks, and claiming to get robust ad hoc policies from it for previously unsolved tasks.
If this actually works, it’s pretty important. But that’s the core claim: that he can solve ad hoc tasks without training or hand tuning.