| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by radarsat1 2908 days ago
	But if you look at how it uses the demonstrations it's quite interesting. It uses them only as a series of starting points to start learning from. It doesn't actually use state-action pair examples at all, as far as I understand, which is quite different from the idea that comes to mind from the phrase "uses demonstration". It simply starts up the simulation at places that the one single demonstration it has access to got to. In other words, the examples it is learning from are nothing but "by this point in the game you could get here.." But nothing about "when you are here, you should do this.." In a sense it's pretty similar to how you'd learn a game if you watched someone play it through once. (Except backwards, perhaps.)