| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gwern 1495 days ago

Gato, a Decision Transformer on steroids, is pretty much what you would expect, with the expected RL scaling curves†, if you've been following ML scaling research for the past 2 years. It is, however, still mindblowing to see it in reality.

And note that it's only as small (and thus, weak) as it is because they want to run it directly on robots ("We focus our training at the operating point of model scale that allows real-time control of real-world robots, currently around 1.2B parameters").

† https://storage.googleapis.com/deepmind-media/A%20Generalist... looks just like any scaling curve from a text or vision paper...

Also submitted at https://news.ycombinator.com/item?id=31355657

2 comments

extr 1495 days ago

Hi Gwern, I'm the submitter of the other thread. It was quite coincidental to wake up to this announcement this morning, because the last thing I read before bed was your "Clippy" story: https://www.gwern.net/fiction/Clippy

It was a good read, thanks!

link

gwern 1494 days ago

Glad you liked it. I've added Gato as a reference to the story, of course. :)

link

optimalsolver 1495 days ago

Did they try it on Montezuma's Revenge?

link

gwern 1494 days ago

They use ALE '51' instead of 57, so I assume not. (Because Montezuma's Revenge is pretty much purely about exploration, and given demonstrations of a successful agent wouldn't be hard, there's not much benefit to training on it here. Gato would probably get a good score, but no one would care. The hard exploration games in ALE are often left out for that reason.)

link

optimalsolver 1493 days ago

>and given demonstrations of a successful agent wouldn't be hard

Last I checked, the only team that has shown good performance on that game is Uber, and from what I recall they used a controversial hack that would be unlikely to generalize to other environments.

link

gwern 1493 days ago

Yes, the hack they used was for the exploration part: providing a state summary to explicitly decide if a state was new or not, and, in the initial Go-Explore, essentially letting the agent teleport to arbitrary states to begin exploring from there.

However, once the exploring was done, they could train an agent on the trajectories of the exploring agent to solve MR with no problem. That's why I say that MR is an exploration problem and training on demonstrations from a player which has already solved MR would obviously work - because it does. So it doesn't show anything interesting about Gato, because Gato would be solving the part of MR that everyone is agreed is basically trivially easy.

link