Hacker News new | ask | show | jobs
by algo_trader 1800 days ago
Can you share some recent references?

(Are you referring to the early papers showing that MPC and LQR solve SOME problems faster ?!)

1 comments

One example with model-based RL is "World Models" by Ha and Schmidhuber. They pre-train an autoencoder to reduce image observations to vectors, then pre-train a RNN to predict future reduced vectors, then use a parameter-space global optimization algorithm (but any RL algorithm would work) to train a policy that's linear in the concatenated observation vector and RNN hidden state.

The important thing here is that the image encoder and the RNN weren't trained end-to-end with the policy. The learned "features" captured enough information to be an effective policy input, even though they only needed to be useful for predicting future states.

It's also interesting that the image encoder was trained separately from the RNN. I think that only worked because the test environments were "almost" fully observable - there is world state that cannot be inferred from a single image observation, but knowing that state is not necessary for a good policy.