Hacker News new | ask | show | jobs
by Sharlin 660 days ago
Uff, I guess you’re right. Mea culpa. I misread their diagram to represent inference when it was about training instead. The latter is conditioned on actions, but… how do they generate the actual output frames then? What’s the input? Is it just image-to-image based on the previous frame? The paper doesn’t seem to explain the inference part at all well :(
1 comments

It should be possible to generate an initial image from Gaussian noise, including the latent information on player position