Hacker News new | ask | show | jobs
by jvanderbot 826 days ago
I am fond of saying there are only two hard problems in robotics: Perception and Funding. If you have a magical sensor that answers questions about the world, and have a magic box full of near-limitless money, you can easily build any robotic system you want. If perception is "processing data from sensors and users so we can make decisions about it", then there isn't much robotics left.

Got a controls problem? forward predict using the magic sensor.

Got a planning problem? just sense the world as a few matrices and plug it into an ILP or MDP.

What did the user mean? Ask the box.

etc etc. Distilling the world into the kind of input our computers require is immesnely difficult, but once that's done "My" problem (being a planning expert) is super easy. I'm often left holding the bag when things go wrong because "my" part is built last (the planning stack), and has the most visible "breaks" (the plan is bad). But it's 90% of the time traceable up to the perception, or a violated assumption about the world.

TFA is spot on - it's just not clear how to sense the world to make "programming" robotics a thing. In the way you'd "program" your computer to make lines appear on a screen or packets fly across the internet, we'd love to "program" a robot to pick up an object and put it away, but even a specious attempt to define generally what "object" and "put away" mean is still 100s of PhD theses away.So it's like we invent the entire ecosystem from scratch each time we build a new robot.

6 comments

I love this perspective.

It’s also made me draw parallels between the experiences with actual people, especially others in my household. With young children who are at the early parts of “doing household chores” of development there is basically constant refinement on what “clean the floor”, “put things away”, etc. _really_ means. I know my wife and I have different definitions on these things too. Our ability to be clear and exhaustive enough upfront on the definitions to have a complete perception and set of assumptions is basically non-existent. We’re all only human! But our willingness to engage in fixing that with humans is also high. If my kids repeatedly miss a section under some chairs when vacuuming we talk about it and know it will improve. When my Roomba does it it sucks and can’t do its job properly. Even thinking about hiring professional trades people to come do handiwork it’s rarely perfect the first time. Not because they’re bad, just because being absolutely precise about things upfront can be so difficult.

Really there are three problem in robotics: Perception, Funding, and Cables :)
Connectors imo :)
And fasteners. I swear any automation system is 90% cables, connectors and fasteners by weight.
Totally. I worked on the electronics in robot arms for a while and EVERY TIME there was a failure in the field - it was the cables.
Only one of them is fun to manage.
Perception, right?
It's so great to read genuine yet experienced insight like this.

Like last night on Twitter I saw an opening for Robotic Behavior Coordinator at Figure. I know for sure, having analyzed this problem with "nothing else" to do for 20 years, I would crush it with humility, and humanity would profit in orders of magnitude.

But they are not set up to hand me control of the rounding error of $40M I'd like [and would pay forward], *nor would their teams listen to me, due to human nature and academ-uenza*.

Such is our loss.

(as you ~say, "reinventing the ecosystem from scratch...")

> humility

> humanity would profit in orders of magnitude

>> touché :)

>> but please believe, I would not risk ostracism on this (my favorite) forum if I were not [approaching] 100% sure.

Ah, sorry if I sounded like a douche.

Have my Y-C idea now.

here we gooooo ..!.. ;)

even a specious attempt to define generally what "object" and "put away" mean is still 100s of PhD theses away

Is this part still true? There are widely available APIs (and even running at home on consumer level hardware to some extent) that can pick an object out of an image, describe what it might be useful for and where it could go.

Imagine you program a robot to "put away" a towel. Then it opens the door and finds there's a cup in the place already. Now what? Or a mouse. Or a piece of paper that looks like a towel in this lighting. Or a child.

Imagine the frustration if the robot kept returning to you saying "I cannot put this away". You'd get rid of the robot quickly. Reasoning at that level is so difficult.

But then imagine it was just a towel all along - oops, your perception system screwed up and now you put the towel in the dishwasher. Maybe this happens 1/1,000,000 times, but that person posts pictures on the internet and your company stock tanks.

Most robotic companies today still use traditional tracking and filtering (e.g. kalman filters) to help with associating detected objects with tracks (objects over time). Solving this in an fully differentiable / ML-first way for multiple targets is still WIP at most companies, since deepnet-to-detect + filtering is still a strong baseline and there are still challenges to be solved.

Occlusions, short-lived tracks, misassociations, low frame rate + high-rate-of-change features (e.g. flashing lights) are all still very challenging when you get down to brass tacks.

It's definitely not a solved problem in general, especially in realtime.

It's a lot easier to get started on something interesting and maybe even useful than it was even 10 years ago.

A lot of the "ah we can just use X API" falls apart pretty fast when you do risk analysis on a real system. Lots of these APIs are do a decent job most of the time under somewhat ideal conditions, beyond that things get hairy.

> that can pick an object out of an image

You have to do it in real time, from a video feed, and make sure that you're tracking the same unique instance of that object between frames.

Robots could make a short stop or go slower to process an unclear picture, that is probably not the problem - but the image processing itself, is still way too unreliable. Under ideal condition it mostly works, but have some light fog in the picture or strong sunlight and ... usually all fails.

Otherwise the Teslas would have indeed full self driving mode, using only cameras.

>Robots could make a short stop or go slower to process an unclear picture

The costs of doing so are hugely dependent application. It is not, for example, an attractive strategy for an image-guided missile, though it's probably fine for an autonomous vacuum cleaner.

And then you need to grasp it.
If someone could readily do it using GPT-4V with its apparent sentience, it must be happening already. So far there have been just few demos that shows obvious signs of manual programming, manual remote operation, and/or even VFX editing in some cases.
That language sounds borne of hair-pulling disbelief.

If they can put ImageNet on a SOC, they can do it. [probably too big/watt]

Better yet: ImageNet bones on SOC, cacheable "Immediate Situation" fed by [the obvious logic programming that everyone glances past :) ]

> This is how Cybernetics starts y'all. <
Cute quote - added to https://github.com/globalcitizen/taoup :)

I would add supply chain, however.

To solve that:

Assumption: Apple's supply chain is gold standard [~max iterative tech envelope push & max known demand]

Hypothesis: This is swiftly re-creatable for any [max believable & max useful] product. "Detroit, waiting".

An honor! Pleased to contribute.
What about transformers for robotics, like ALOHA, they seems to help with learning new tasks.