|
|
|
|
|
by zhanwei
3894 days ago
|
|
It seems to me to be all about having the right prior and planning for exploration.
Policy search methods (http://arxiv.org/abs/1504.00702) assume that there aren't many trajectories that make sense (based on prior knowledge/testing in simulator) and search for the best ones among those that make sense using real-world data. Even within policy search you need some kind of exploration such injecting gaussian noise in trajectories. The hard part is to come up with a model for exploration. |
|