Hacker News new | ask | show | jobs
by krasin 530 days ago
> What is the new breakthrough in robotics that is gpu driven ? There are subsets of the overall problem that can be solved by a gpu (eg object detection) but the whole planning and control algo scheme seems to be more or less the same as it has been for the past decades. These typically involve non-convex optimization so not much gpu benefit.

In the past two years two very important developments appeared around imitation learning and LLMs. Some starting points for this rabbit hole:

1. HuggingFace LeRobot: https://github.com/huggingface/lerobot

2. ALOHA: https://aloha-2.github.io/

3. https://robotics-transformer2.github.io/

4. https://www.1x.tech/discover/1x-world-model

1 comments

We've been here many times before. Imitation learning doesn't generalise and that makes it useless in practice.

Aloha is a great example of that. It's great for demos, like the one where their robot "cooked" (not really) one shrimp, but if you wanted to deploy it to real peoples' houses you'd have to train it for every task in every house over a few hours at a time. And "a task" is still at the level of "cook (not really) one shrimp". You want to cook (not really) noodles? It's a new task and you have to train it all over again from scratch. You want it to fold your laundry? OK but you need to train it on each piece of laundry you want it to fold, separately. You want it to put away the dishes? Without exaggeration you'd have to train it to handle each dish separately. You want it to pick up the dishes from the kitchen? Train for that. You want it to pick up the dishes from the living room? Train for that. And so on.

It sucks so much with miserable disappointment that it could bring on a new AI winter on its own, if Google was dumb enough to try and make it into a product and market it to people.

Robot maids and robot butlers are a long way away. Yeah but you can cook one shrimp (not really) with a few hours of teleoperation training in your kitchen only. Oh wow. We could never cook (not really) one shrimp before. I mean we could but this uses RL and so it's just one step from AGI.

It's nonsense on stilts.

I generally agree with your analysis of the current state of art but strongly disagree with the overall conclusion of where it leads us.

I believe it will take on the order of 100M hours of training data of doing tasks in real world (so, not just Youtube videos), and much larger models than we have now to make general-purpose robotics working, but I also believe that this will happen.

I've saved your comment to my favorites and hope to revisit it in 10 years.

Thanks, that'll be interesting :)