Hacker News new | ask | show | jobs
by zhanwei 3894 days ago
It seems to me to be all about having the right prior and planning for exploration. Policy search methods (http://arxiv.org/abs/1504.00702) assume that there aren't many trajectories that make sense (based on prior knowledge/testing in simulator) and search for the best ones among those that make sense using real-world data. Even within policy search you need some kind of exploration such injecting gaussian noise in trajectories. The hard part is to come up with a model for exploration.