|
|
|
|
|
by BrandonSmithJ
2986 days ago
|
|
Is this similar to Dyna-Q learning, but with modeling/simulation being handled by the RNN? It looks like the VAE is just used to create a feature vector, so the main difference seems to be in the MDN-RNN - which is taking the place of the usual state/action simulation in Dyna-Q. |
|
The VAE learns a compressed vector and the latent variables are somewhat meaningful. The VAE can also be sampled from and is not just a table of memorized examples. The RNN maintains coherence with actions and observations of previous time-steps and a separate controller is also learned. The end result is their approach is richer and more flexible.