| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by GlenTheMachine 998 days ago

Ehhh.

Solving any one problem with robotic manipulation isn’t all that hard. It takes a lot of trial and error, but in general if the task is constrained you can solve it reliably. The trick is to solve *new* tasks without resorting to all that fine tuning every time. Which is what Russ is claiming here. He’s training an LLM with a corpus of one-off policies for solving specific manipulation tasks, and claiming to get robust ad hoc policies from it for previously unsolved tasks.

If this actually works, it’s pretty important. But that’s the core claim: that he can solve ad hoc tasks without training or hand tuning.

2 comments

dotancohen 998 days ago

  > He’s training an LLM with a corpus of one-off policies for solving specific manipulation tasks, and claiming to get robust ad hoc policies from it for previously unsolved tasks.

It seems clear that many people do not understand that this is the key breakthrough: solving arbitrary tasks after learning previous, unrelated tasks.

In my opinion that really is a good definition of intelligence, and puts this technique at the forefront of machine intelligence.

link

steve_adams_86 998 days ago

Is the pancake and spatula problem actually that constrained though?

I know it isn’t as open ended as plenty of more important problems in robotics, but this doesn’t strike me as easy at all.

I’ve only dabbled in robotics as an entry level hobbiest, so I really don’t know the answer.

link

GlenTheMachine 998 days ago

It’s constrained enough to be tractable.

link

steve_adams_86 998 days ago

Fair enough. When would you say it stops being tractable? What single, practical thing could we add to this problem to make intractable?

link

xyzzy123 998 days ago

Flipping a pancake in a "random kitchen" would be much more difficult and have many of the same issues as the door problem.

It's hard to point to a single thing that would make "flipping pancakes" intractable, it's sort of the other way around, to usefully flip pancakes in the same way as a person takes a lot of skills chained together.

The "door problem" is a sort of compendium of many real-world skills, identifying the door, understanding its affordances and how to grip / manipulate them, whether to push or pull the door, predicting the trajectory of the door when opened, estimating the mass of the door and applying the right amount of force, understanding if there any springs or pulls on the door and how it must be held to traverse through it. Etc. There are also a ton of things I'm missing that are so fundamental one tends to take them for granted, like knowing your own size and that you can't fit through a tiny doorway.

I think you can ramp towards the "door problem" in difficulty by slowly relaxing constraints. A video linked above (not article) shows "can flip a pancake successfully with a particular pan (you are already holding) and pancake with a fixed camera and visual markers". Ok, now do it in varying lighting conditions. With no visual markers. With different camera views. Different pancakes. Real pancakes (which are not rigid, and sometimes stick to the pan). Different pans. Now you have to pick up the pan. Use a stove. Different stoves. Identify griddle vs pan and use the right flipping technique. Find everything and do it all in a messy kitchen... eventually you're getting to same ballpark as the "door problem".

link

ziofill 998 days ago

physicist here (so very naive on these topics) - I’m wondering how to compare the steps you mention regarding the door problem (especially the predictive ones, e.g. about the trajectory of the door as it opens, etc) with how humans open doors? Surely people don’t stop in front of a door and begin planning things out, rather they seem to go for it and adjust on the fly, is this an approach that won’t work in robotics? Why not?

link

xyzzy123 998 days ago

So classical robotics yeah, people used to write code for each step of opening a door. Practically speaking you would probably not do motion planning on the door, you would just code it up with a bunch of heuristics like, try to be over here in relation to the doorframe because that's a good opening spot and will probably work. Ok you're in the right place? Now, move gripper towards the door handle... etc. Bunch of hacks. Put enough hacks together and you can kinda sorta open (some) doors. Oh this is a SLIDING door? Damn we forgot to code for that...

The way things are going is sensors (cameras, force, etc) and neural networks. You let the robot try a bunch of ways of opening doors, sometimes it doors itself in the face, eventually it'll figure out good places to stand based on what the door looks like. The more doors you make it try to open hopefully the better it gets at generalising over the task of opening doors. The hacks/heuristics are really still there but the robot is supposed to learn them.

> Surely people don’t stop in front of a door and begin planning things out, rather they seem to go for it and adjust on the fly, is this an approach that won’t work in robotics? Why not?

Yeah, figuring out how to do this is basically "the problem". Most people don't have a sense or feeling of "planning things out" as they open a door because we reached "unconscious competence" at that task. We definitely have predictions of what is going to happen as we start opening the door based on prior experience and our observations so far. If reality diverges from our expectations we will experience surprise, make some new predictions, take actions to resolve the surprise, etc.

Not sure that anyone has ever studied how people open doors in detail, it'd be interesting. I bet there are a ton of subtle micro behaviours. One that I know is, if you hear kids running in the house it is a good idea to keep a foot planted in front of you as you approach the door, because those guys will absolutely fling or crash doors open right into your face.

link

dclowd9901 998 days ago

What makes you think a kitchen would have to be random? We regularly design physical spaces to accommodate robots.

link

xyzzy123 998 days ago

I was responding to address why the "door problem" is more difficult than "pancake flipping under controlled conditions".

(I also ignored that door opening is generally done by mobile robots of a certain weight class which tend to be more expensive than a stationary arm with enough strength to pick up a spatula or hold a pan).

There is a steep difficulty gradient from "works in the lab" to "works under semi-controlled real world conditions" to "works in uncontrolled real-world situations".

link