| HN Mirror

Oh, wow. So really this allows prediction of a complex input, in a general sense?

If I understand correctly, in context of that visual example, if it were trained with a moving camera and a static scene, then its prediction would only be able to predict scene transformations caused by that moving camera. Maybe this explains how the tracking somewhat fails when the ball is moving along the grass towards the end the scene. It doesn't "know" much about moving objects, just moving cameras? So then training with moving objects, would let it predict those, as well?

In what the video is showing, if it can predict perspective transforms from camera movement, like it seems to be doing, does that means it's making something like a 3d model, or something like a depth map used for its motion prediction, somewhere in there?

I would love to see the error video of some sort of rotating 3d wireframe that it was trained to.

This whole approach of a general "predictor" seems extremely incredible.