| HN Mirror

Yeah, it's the same general principle of using a model to cheaply speed up policy learning. An advantage to their approach however, is that it learns a latent space and generalizes better.

The VAE learns a compressed vector and the latent variables are somewhat meaningful. The VAE can also be sampled from and is not just a table of memorized examples. The RNN maintains coherence with actions and observations of previous time-steps and a separate controller is also learned. The end result is their approach is richer and more flexible.