|
|
|
|
|
by Matumio
1830 days ago
|
|
RL theory is cool and all, but a large part of the trick seems to be the neural network architecture itself (that is, having a good parametrized model for the policy) and millions of evaluations. For some Atari games it turned out to be almost as sample-efficient to try random weights instead of RL. Not for everything, and e.g. policy gradient is certainly worth learning about. But see for example https://openai.com/blog/evolution-strategies/ Personally I expect all real-world applications of RL to be trained in simulation, with tricks to make sure they also learn to adapt to reality at startup (meta-learning). For example by simulating each episode with parameters that are slightly off. |
|