| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kibwen 295 days ago
	For LLMs, the input is text, and the output is text. By the time of GPT-2, the internet contained enough training data to make training an interesting LLM feasible (as judged by its ability to output convincing text). We are nowhere near the same for autonomous robots, and it's not even funny. To continue to use the internet as an analogy for LLMs, we are pre-DARPANET, pre-ASCII, pre-transistor. We don't even have the sensors that would make safe household humanoid robots possible. Any theater from robot companies about trying to train a neural net based on motion capture is laughably foolish. At the current rate of progress, we are more than decades away.

5 comments

tyre 295 days ago

I would guess Amazon has a ridiculous amount of access to training data in its warehouses. Video, package sizes, weights, sorting.

I’m sure they could pretty easily spin up a site with 200 of these processing packages of most sizes (they have a limited number of standardized package sizes) nonstop. Remove ones that it gets right 99.99% of the time and keep training on the more difficult ones, the move to individual items.

Caveat: I have no idea what I’m talking about.

link

eulgro 295 days ago

A more efficient way might be to train them in simulation. If you simulate a warehouse environment and use that to pre-train a million robots in parallel at 100x real time learning would go much faster. Then you can fine tune on reality for details missed by the simulation environment.

link

bmau5 295 days ago

Does your estimate account for advancements in virtual simulation models that has simultaneously been happening? From people I speak to in the space (which I am very much not in) - they had mentioned these advancements have dramatically improved the rate of training and learning - though they also advised we're some ways off from showtime.

link

kibwen 295 days ago

As Tesla could tell you with their failure to deliver self-driving cars, it doesn't matter if you have exabytes of training data if it's all the wrong kind of data and if your hardware platform is insufficiently capable.

link

fragmede 294 days ago

Time will tell if that's true. We don't have the same corpus of data, that's true, but what we do have is the ability to make a digital twin, where the robot practices in a virtual world, what would happen. It can do 10,000 jumping jacks every hour, parallelized across a whole GPU supercomputer, and that data can be fed in as training data.

link

blackoil 295 days ago

McD must be selling millions of burgers every day and cameras are cheap and omnipresent, so should not be difficult to get videos for single type of tasks.

link

kibwen 295 days ago

There is no reason to employ humanoid robots in industrial environments when it will always be easier and cheaper to adapt the environment to a specialized non-humanoid robot than to adapt robots into humanoid shape. This is true for the same reason that no LLM is ever going to beat Stockfish at chess.

link

ACCount37 295 days ago

Robotics has a big training data problem. But your "we don't have the sensors" claim is absolutely laughable.

It was never about the sensors. It was always about AI.

link

kibwen 295 days ago

No, it doesn't matter if you have a hypergenius superintelligence if it's locked in a body with no hardware support for useful proprioception. You will not go to space today.

link

serf 294 days ago

A 'hypergenius superintelligence' could achieve most, if not all useful proprioception simply by looking at motor amperage draw, or if that's unavailable then total system amperage draw.

An arm moving against gravity has a higher draw, the arc itself creates characteristics, a motion or force against the arm or fingers generates a change in draw -- a superintellligence would need only an ammeter to master proprioception, because human researchers can do this in a lab and they're nowhere near the bar of 'hypergenius superintelligence'.

link

ACCount37 295 days ago

Lmao no. Every motor is a sensor. And the better my world model is, the less sensors I need to keep it up.

link