|
|
|
|
|
by ragebol
1801 days ago
|
|
Has robotics had such an 'ImageNet moment'? Nothing springs to mind, just slow advancement over decades. If suddenly robot manipulators could grasp any object, operate any knob/switch, tie knots, manipulate cloth, with the same manipulator, on first sight, that would be quite a feat. But then there's still task planning which is a very different topic. And ... and .... So much still to develop for generally useful robots. |
|
Just getting it to navigate itself using vision would mean building a complex system with a lot of pieces (beyond the most basic demo anyway). You need separate neural nets doing all kinds of different tasks and you need a massive training system for it all. You can see how much work Tesla has had to do to get a robot to safely drive on public roads. [2]
From where I am sitting now, I think we are making good inroads on something like an "Imagenet moment" for robots. (Well, I should note that I am a robotics engineer but I mostly work on driver level software and hardware, not AI. Though I follow the research from the outside.)
It seems like a combination of transformers plus scale plus cross domain reasoning like CLIP [3] could begin to build a system that could mimic humans. I guess as good as transformers are we still haven't solved how to get them to learn for themselves, and that's probably a hard requirement for really being useful in the real world. Good work in RL happening there though.
Gosh, yeah, this is gonna take decades lol. Maybe we will have a spark that unites all this in one efficient system. Improving transformer efficiency and achieving big jumps in scale are a combo that will probably get interesting stuff solved. All the groundwork is a real slog.
[1] https://reboot.love/t/new-cameras-on-rover/277
[2] https://www.youtube.com/watch?v=hx7BXih7zx8
[3] https://openai.com/blog/clip/