Hacker News new | ask | show | jobs
by treespace8 1005 days ago
This looks impressive. Much more than even the Boston Dynamics demonstrations.

Flipping a pancake is extremely difficult because each pancake is different. I know that these videos must be cherry-picked but to be able to train a Robot to do this just by demonstrating feels like a massive leap.

3 comments

Flipping a pancake was done in 2010. What looks impressive for humans is easy for robots and vice versa: https://youtu.be/W_gxLKSsSIE?si=HDyNXe1Ys_eFXiVU Another case in point: robot juggling was done in 1990s and to date we do not have a robot that can open any door reliably like a human. Kind of like Moravecs Paradox
To be fair it is far more complex for a robot to grip a spatula and use that spatula on a griddle than to use dynamic motion to flip a pancake in a pan.
Ehhh.

Solving any one problem with robotic manipulation isn’t all that hard. It takes a lot of trial and error, but in general if the task is constrained you can solve it reliably. The trick is to solve *new* tasks without resorting to all that fine tuning every time. Which is what Russ is claiming here. He’s training an LLM with a corpus of one-off policies for solving specific manipulation tasks, and claiming to get robust ad hoc policies from it for previously unsolved tasks.

If this actually works, it’s pretty important. But that’s the core claim: that he can solve ad hoc tasks without training or hand tuning.

  > He’s training an LLM with a corpus of one-off policies for solving specific manipulation tasks, and claiming to get robust ad hoc policies from it for previously unsolved tasks.
It seems clear that many people do not understand that this is the key breakthrough: solving arbitrary tasks after learning previous, unrelated tasks.

In my opinion that really is a good definition of intelligence, and puts this technique at the forefront of machine intelligence.

Is the pancake and spatula problem actually that constrained though?

I know it isn’t as open ended as plenty of more important problems in robotics, but this doesn’t strike me as easy at all.

I’ve only dabbled in robotics as an entry level hobbiest, so I really don’t know the answer.

It’s constrained enough to be tractable.
Fair enough. When would you say it stops being tractable? What single, practical thing could we add to this problem to make intractable?
Yes! In layman's terms: is the most efficient way to train these robots by showing them billions of videos of how it's done?
Almost certainly not. Because the sense of touch is an important part of the problem and that data isn’t present in videos.
Not just touch but proprioception. Robots in human environments will have to be better at proprioception than 98% of humans. If I bump into you it’s typically anything from annoying to a meetcute. I’m a pretty big guy, but if you had to chose me to step on your foot or somebody else, it’s probably me you want, because I will shift my weight off your foot before you even know what happened (tai chi) because you will barely notice.

If instead your choice is your high school bully or a robot, well for now pick the bully. Because that robot isn’t even being vicious and will hurt more.

> Because that robot isn’t even being vicious and will hurt more.

Rodney Brooks at the MIT AI Lab was a big advocate of something called "series elastic actuators." The idea was was that you didn't allow motors to directly turn robot joints. Instead, all motors acted through some kind of elastic. And the robots could also measure how much resistance they encountered and back off.

MIT had a number of demos of robots that played nicely around fragile humans. I remember video of a grad student stopping a robot arm with their head.

Now, using series elastic actuators will sacrifice some amount of speed or precision. You wouldn't want to do it for industrial robots. And of course, robots also tend to be heavy and made of metal, so even if they move gently, they still pose real risks.

But real progress has been made on these problems.

I think you're probably right, and those non-linear systems are going to make me have to increase my estimate for how long it takes for a robot to go from 5 year old child to ninja physicality. The more complex the feedback mechanisms, the more complexity there is in, for instance, screwing in a screw as fast as possible.
The robot won't take any enjoyment out of it, and won't laugh at your pain. Won't post about it on social media. Isn't going to try and fuck your ex or sister or mom.

I'll take the robot, thanks.

Your friends will though.
"friends"
I'm pretty sure that if I had never opened a door before and I saw somebody opening a door in a video, I would immediately know how to open doors just by watching the video. And that would be any door, with any kind of door handle. Not because I got superpowers, but because I'm average-human.

So, the moment your system needs this kind of data and that kind of data, oh and btw it needs a few hundreds of thousands of examples of all those kinds of data, that's very clear to me that it's far away from being capable of any kind of generalisation, any kind of learning general behaviour.

So that's "60 difficult, dexterous skills" today, "1,000 by the end of 2024", and when the aliens land to explore the ruins of our civilisation your robot will still be training on the 100,000'th "skill". And still will fall over and die when the aliens greet it.

Can you train a robot to imagine touch by showing it what touch would feel like in many video scenarios?
I think their robot has a way of converting touch to a video input. The white bubble manipulator has a pattern printed on the inside that a camera watches for movement. (see 1:58 of the video).
And here I thought manual labor jobs were safe for a very long time. I really hope people at the policy level are thinking about what it looks like to have a world of people that don’t have any work to do.