|
Adding to that, even if we had perfect end-effectors with a good sense of touch, understanding the real world enough to manipulate it is hard. These days we have 3d cameras, but they still only see part of the objects we want to manipulate. The back side is hidden. So you need to either specify and model all objects to interact with, or have some word of a world model where we can predict what the full object, it's weight, center of gravity, surface texture, etc, is like. And before we even decide to manipulate it, we have to detect it, categorize it and segment it (where does the pan stop and the stove begin?). We have to plan out a manipulation task, including finding grasp points, finding movement patterns that do not interfere with the rest of the environment, etc. It's a whole bunch of separate problems that need solving all at once. There's motor control, building the right manipulators with the right sensors, bringing all the sensor data into something where we can make a single decision, understanding of the world and what happens during manipulation, and higher level planning. |
I'm far removed from this field, and speaking as a layperson, so pardon my ignorance.