Hacker News new | ask | show | jobs
by LostLocalMan 998 days ago
To be fair it is far more complex for a robot to grip a spatula and use that spatula on a griddle than to use dynamic motion to flip a pancake in a pan.
1 comments

Ehhh.

Solving any one problem with robotic manipulation isn’t all that hard. It takes a lot of trial and error, but in general if the task is constrained you can solve it reliably. The trick is to solve *new* tasks without resorting to all that fine tuning every time. Which is what Russ is claiming here. He’s training an LLM with a corpus of one-off policies for solving specific manipulation tasks, and claiming to get robust ad hoc policies from it for previously unsolved tasks.

If this actually works, it’s pretty important. But that’s the core claim: that he can solve ad hoc tasks without training or hand tuning.

  > He’s training an LLM with a corpus of one-off policies for solving specific manipulation tasks, and claiming to get robust ad hoc policies from it for previously unsolved tasks.
It seems clear that many people do not understand that this is the key breakthrough: solving arbitrary tasks after learning previous, unrelated tasks.

In my opinion that really is a good definition of intelligence, and puts this technique at the forefront of machine intelligence.

Is the pancake and spatula problem actually that constrained though?

I know it isn’t as open ended as plenty of more important problems in robotics, but this doesn’t strike me as easy at all.

I’ve only dabbled in robotics as an entry level hobbiest, so I really don’t know the answer.

It’s constrained enough to be tractable.
Fair enough. When would you say it stops being tractable? What single, practical thing could we add to this problem to make intractable?
Flipping a pancake in a "random kitchen" would be much more difficult and have many of the same issues as the door problem.

It's hard to point to a single thing that would make "flipping pancakes" intractable, it's sort of the other way around, to usefully flip pancakes in the same way as a person takes a lot of skills chained together.

The "door problem" is a sort of compendium of many real-world skills, identifying the door, understanding its affordances and how to grip / manipulate them, whether to push or pull the door, predicting the trajectory of the door when opened, estimating the mass of the door and applying the right amount of force, understanding if there any springs or pulls on the door and how it must be held to traverse through it. Etc. There are also a ton of things I'm missing that are so fundamental one tends to take them for granted, like knowing your own size and that you can't fit through a tiny doorway.

I think you can ramp towards the "door problem" in difficulty by slowly relaxing constraints. A video linked above (not article) shows "can flip a pancake successfully with a particular pan (you are already holding) and pancake with a fixed camera and visual markers". Ok, now do it in varying lighting conditions. With no visual markers. With different camera views. Different pancakes. Real pancakes (which are not rigid, and sometimes stick to the pan). Different pans. Now you have to pick up the pan. Use a stove. Different stoves. Identify griddle vs pan and use the right flipping technique. Find everything and do it all in a messy kitchen... eventually you're getting to same ballpark as the "door problem".

physicist here (so very naive on these topics) - I’m wondering how to compare the steps you mention regarding the door problem (especially the predictive ones, e.g. about the trajectory of the door as it opens, etc) with how humans open doors? Surely people don’t stop in front of a door and begin planning things out, rather they seem to go for it and adjust on the fly, is this an approach that won’t work in robotics? Why not?
What makes you think a kitchen would have to be random? We regularly design physical spaces to accommodate robots.