Hacker News new | ask | show | jobs
by irrk 1799 days ago
Agreed, deep architectures are really only needed for feature engineering. There have been a few papers showing that even for these very deep setups, the actual policy can almost always be fully captured in a small mlp.
2 comments

Can you share some recent references?

(Are you referring to the early papers showing that MPC and LQR solve SOME problems faster ?!)

One example with model-based RL is "World Models" by Ha and Schmidhuber. They pre-train an autoencoder to reduce image observations to vectors, then pre-train a RNN to predict future reduced vectors, then use a parameter-space global optimization algorithm (but any RL algorithm would work) to train a policy that's linear in the concatenated observation vector and RNN hidden state.

The important thing here is that the image encoder and the RNN weren't trained end-to-end with the policy. The learned "features" captured enough information to be an effective policy input, even though they only needed to be useful for predicting future states.

It's also interesting that the image encoder was trained separately from the RNN. I think that only worked because the test environments were "almost" fully observable - there is world state that cannot be inferred from a single image observation, but knowing that state is not necessary for a good policy.

I don't really see the dichotomy. Surely the features you need to learn depend on the task? Or do you mean running linear / shallow rl on top of unsupervised learning?