Hacker News new | ask | show | jobs
by Matumio 1830 days ago
RL theory is cool and all, but a large part of the trick seems to be the neural network architecture itself (that is, having a good parametrized model for the policy) and millions of evaluations. For some Atari games it turned out to be almost as sample-efficient to try random weights instead of RL. Not for everything, and e.g. policy gradient is certainly worth learning about. But see for example https://openai.com/blog/evolution-strategies/

Personally I expect all real-world applications of RL to be trained in simulation, with tricks to make sure they also learn to adapt to reality at startup (meta-learning). For example by simulating each episode with parameters that are slightly off.