| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gwern 3163 days ago
	DM has already done a bunch of work on 'deep models' of environments to plan over. Use them and you have 'model-predictive control' and planning, and this tree extension to policy gradients would work as well (probably). It could be pretty interesting to see what would happen if you tried that sort of hybrid on ALE.

2 comments

mannigfaltig 3163 days ago

I guess deep world models are still severely riddled by all sorts of problems: vanishing gradients, BPTT being O(T), poor generalization ability of NNs (which likely is due to the lack of attractor state associative recall, as well as concept composability), lack of probabilistic message passing to deal with uncertainty, and perhaps some priors about the world are necessary to make learning tractable (such as spatial maps and fine-tuning for time scales that contain interesting information).

link

disposable_123 3163 days ago

What are the main papers from DM on this ? Are you referring to "CONTINUOUS CONTROL WITH DRL" ?!

link