|
|
|
|
|
by ks1723
439 days ago
|
|
I my view, the 'exactly' is crucial here. They do implicitly tell the model what to do by encoding it in the reward function: In Minecraft, the team used a protocol that gave Dreamer a ‘plus one’ reward every time it completed one of 12 progressive steps involved in diamond collection — including creating planks and a furnace, mining iron and forging an iron pickaxe. This is also why I think the title of the article is slightly misleading. |
|