Hacker News new | ask | show | jobs
by the8472 793 days ago
Animation is much lower framerate than live video, motion can be extremely exaggerated and the underlying shape can depend on the view, i.e. be non-euclidean. Additionally there are fewer high-frequency features (think leopard spots) that can be cues about how the global shape moves (leopard outline). And of course things are drawn by humans, not captured by cameras, which means animation errors will be pervasive throughout the training data.

These things combined mean less information to learn a more difficult world model.