Hacker News new | ask | show | jobs
by ks1723 439 days ago
I my view, the 'exactly' is crucial here. They do implicitly tell the model what to do by encoding it in the reward function:

In Minecraft, the team used a protocol that gave Dreamer a ‘plus one’ reward every time it completed one of 12 progressive steps involved in diamond collection — including creating planks and a furnace, mining iron and forging an iron pickaxe.

This is also why I think the title of the article is slightly misleading.

1 comments

It's kind of fair, humans also get rewarded for those steps when they learn Minecraft
But they don't learn that way at all, my 7yo learns by watching youtubers. There's a whole network of people teaching each other the game, that's almost more fun than playing it alone.