| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zith 936 days ago

Adding to that, even if we had perfect end-effectors with a good sense of touch, understanding the real world enough to manipulate it is hard.

These days we have 3d cameras, but they still only see part of the objects we want to manipulate. The back side is hidden. So you need to either specify and model all objects to interact with, or have some word of a world model where we can predict what the full object, it's weight, center of gravity, surface texture, etc, is like.

And before we even decide to manipulate it, we have to detect it, categorize it and segment it (where does the pan stop and the stove begin?). We have to plan out a manipulation task, including finding grasp points, finding movement patterns that do not interfere with the rest of the environment, etc.

It's a whole bunch of separate problems that need solving all at once. There's motor control, building the right manipulators with the right sensors, bringing all the sensor data into something where we can make a single decision, understanding of the world and what happens during manipulation, and higher level planning.

1 comments

imiric 936 days ago

I realize these are difficult problems, but couldn't we simulate how the human brain approaches these situations? That is, we don't model the entire 3D world in our head, but make decisions in real-time mostly by intuition and previous knowledge. We perceive depth of objects visually, and loosely map out their position and dimensions that way. We don't need to know the center of mass of every object, but have general intuition for where to grab it (if it has a handle, etc.). We have touch sensors to determine if something is hot or cold, and thus safe to handle, but a robot could have actual temperature sensors, making this easier.

I'm far removed from this field, and speaking as a layperson, so pardon my ignorance.

link

MaxikCZ 936 days ago

The thing is that you take intuition for granted, but machine parts just have none. Programming intuition is exceedingly hard, but we are getting closer with neural networks. I'd say its easier to program machine calculating predicted centre of mass of an object than algorithmic sense of intuition outputting suitable spot to grab the item effectively.

link

imiric 935 days ago

I get that, but yeah, with ML it would be a matter of training it on raw data: objects, materials, physical properties and behaviors, etc. And then "intuition" would arise from this knowledge, and its own experience from reinforced learning. It's the same problem as implementing self-driving in vehicles, just applied to a different domain. I'm not downplaying the difficulty, of course, but pointing out that this type of automation wouldn't be feasible if we'd have to classically program every scenario the robot is likely to encounter.

link

NalNezumi 935 days ago

I don't think you're downplaying the difficulty but just completely unaware of the depth of it.

We don't even know if "intuition" would arise from the knowledge you claim, we don't know how that model would work, and even before that, collecting all the data (not to speak of availability of all the sensors) is a vastly more complex than even what ChatGPT or any LLM model data collection would ever be.

>it's own experience from reinforcement learning

This is a common mistake often heard from CS -> ML(RL) -> robotics transition folks. Reward function is given for free in RL, but in the real world, estimating the reward is a complex problem in its self. That's why RL on robotics have mostly seen success in quadrupedal locomotion; the reward function is simple (forward velocity, calculated from IMU), but how would you calculate a reward function in 30Hz+ for a simple task such as "chop onion and put it in the pan"? If you can construct the reward function for that task, most likely, you already have all the world-states available and might as well skip RL and do something else with that, such as Model-predictive control.

As for intuition, see: https://en.wikipedia.org/wiki/Moravec%27s_paradox

link

actionfromafar 935 days ago

I love this comment. It would have taken me hours to write and ended being pages long, and hard to understand.

link

imiric 935 days ago

That's insightful, thanks. I'm indeed not aware of the complexities here. It's not my domain at all.

I love the quote at the end of that article you linked:

> As the new generation of intelligent devices appears, it will be the stock analysts and petrochemical engineers and parole board members who are in danger of being replaced by machines. The gardeners, receptionists, and cooks are secure in their jobs for decades to come.

I should've picked a safer career in gardening...

link

BoiledCabbage 935 days ago

> We should expect the difficulty of reverse-engineering any human skill to be roughly proportional to the amount of time that skill has been evolving in animals.

This other content really jumps out at me as well because it's extremely true.

Even older than walking and manual dexterity are really basic abilities like eating. We're nowhere close on that - were so far off it's not on anyone's radar. Robots will run on batteries or some other form of power - there is no way anyone is close to building robots that can eat break down food and use it for energy and repair. One of the oldest evolutionary traits.

The other is course being procreation. Will a robot be able to assemble a new one from pre-made parts? Likely not too far off. But could a robot build or grow one from scratch? That's so far off in the sci-fi future it's silly.

link

zith 935 days ago

I have not been close to this field in over a decade, but this is the internet, so I will comment anyway!

I think one of the issues is that in some parts of academia, progress is made one PhD at a time. And a PhD is almost always too narrow to bring all of these fields together. I'm sure they are solvable problems, and I'm sure they will be solved. But maybe it will take some other research structure? Private? Guaranteed long time funding for academic teams?

link