Hacker News new | ask | show | jobs
by talldrinkofwhat 889 days ago
I'm about a decade out, so I'm sure the SOTA has moved the goal-posts, but the work I did in grad(ish) school revolved around making end-effectors (fancy word for robot hands) that could both grab a styrofoam cup, and grab a wet glass (cylindrical, no pint-cheaters). Plenty of cups were crushed. Plenty of glasses were dropped. Nothing that did both happened by the time I graduated.

The amount of feedback built into your end-effectors (pedantic word for human hands) is insane. If you're not familiar, proprioception is a good google/wiki hole. Most of the signals that allow you to move your hands don't even hit the brain stem, let alone the boss upstairs.

The challenge mostly lies in how we've instrumented these things. Precision requires low tolerances. Low tolerances + unexpected environment == you've just driven your robot through the countertop/pan/coworker or broken a very nicely geared servo.

1 comments

Adding to that, even if we had perfect end-effectors with a good sense of touch, understanding the real world enough to manipulate it is hard.

These days we have 3d cameras, but they still only see part of the objects we want to manipulate. The back side is hidden. So you need to either specify and model all objects to interact with, or have some word of a world model where we can predict what the full object, it's weight, center of gravity, surface texture, etc, is like.

And before we even decide to manipulate it, we have to detect it, categorize it and segment it (where does the pan stop and the stove begin?). We have to plan out a manipulation task, including finding grasp points, finding movement patterns that do not interfere with the rest of the environment, etc.

It's a whole bunch of separate problems that need solving all at once. There's motor control, building the right manipulators with the right sensors, bringing all the sensor data into something where we can make a single decision, understanding of the world and what happens during manipulation, and higher level planning.

I realize these are difficult problems, but couldn't we simulate how the human brain approaches these situations? That is, we don't model the entire 3D world in our head, but make decisions in real-time mostly by intuition and previous knowledge. We perceive depth of objects visually, and loosely map out their position and dimensions that way. We don't need to know the center of mass of every object, but have general intuition for where to grab it (if it has a handle, etc.). We have touch sensors to determine if something is hot or cold, and thus safe to handle, but a robot could have actual temperature sensors, making this easier.

I'm far removed from this field, and speaking as a layperson, so pardon my ignorance.

The thing is that you take intuition for granted, but machine parts just have none. Programming intuition is exceedingly hard, but we are getting closer with neural networks. I'd say its easier to program machine calculating predicted centre of mass of an object than algorithmic sense of intuition outputting suitable spot to grab the item effectively.
I get that, but yeah, with ML it would be a matter of training it on raw data: objects, materials, physical properties and behaviors, etc. And then "intuition" would arise from this knowledge, and its own experience from reinforced learning. It's the same problem as implementing self-driving in vehicles, just applied to a different domain. I'm not downplaying the difficulty, of course, but pointing out that this type of automation wouldn't be feasible if we'd have to classically program every scenario the robot is likely to encounter.
I don't think you're downplaying the difficulty but just completely unaware of the depth of it.

We don't even know if "intuition" would arise from the knowledge you claim, we don't know how that model would work, and even before that, collecting all the data (not to speak of availability of all the sensors) is a vastly more complex than even what ChatGPT or any LLM model data collection would ever be.

>it's own experience from reinforcement learning

This is a common mistake often heard from CS -> ML(RL) -> robotics transition folks. Reward function is given for free in RL, but in the real world, estimating the reward is a complex problem in its self. That's why RL on robotics have mostly seen success in quadrupedal locomotion; the reward function is simple (forward velocity, calculated from IMU), but how would you calculate a reward function in 30Hz+ for a simple task such as "chop onion and put it in the pan"? If you can construct the reward function for that task, most likely, you already have all the world-states available and might as well skip RL and do something else with that, such as Model-predictive control.

As for intuition, see: https://en.wikipedia.org/wiki/Moravec%27s_paradox

I love this comment. It would have taken me hours to write and ended being pages long, and hard to understand.
That's insightful, thanks. I'm indeed not aware of the complexities here. It's not my domain at all.

I love the quote at the end of that article you linked:

> As the new generation of intelligent devices appears, it will be the stock analysts and petrochemical engineers and parole board members who are in danger of being replaced by machines. The gardeners, receptionists, and cooks are secure in their jobs for decades to come.

I should've picked a safer career in gardening...

I have not been close to this field in over a decade, but this is the internet, so I will comment anyway!

I think one of the issues is that in some parts of academia, progress is made one PhD at a time. And a PhD is almost always too narrow to bring all of these fields together. I'm sure they are solvable problems, and I'm sure they will be solved. But maybe it will take some other research structure? Private? Guaranteed long time funding for academic teams?