Hacker News new | ask | show | jobs
by amelius 2049 days ago
I suppose a wheel with spokes, when turning at the right speed would not work due to aliasing (the wheel would appear to be not moving both in the original and the interpolated video).
1 comments

That's probably true but feels like somewhat of a trivial example...I think there would be some very interesting things to test here. When you get down to something like a 16 frame interpolation of 2 stills, the model is essentially guessing based on context what the interpolated frames should look like. Starting to verge into computational photography territory where the model is supplying it's own interpretation of the action based on human-like semantic understanding of the scene. As someone with an interest but not a career in bleeding edge machine learning, I would be curious to get an intuitive sense of how much this is going on.

Interesting boundary cases might be visuals of physical processes with inherently "chaotic" small-scale behavior. For example:

* What would happen if you fed the model two stills of a drop of food coloring expanding in water? Would it wholesale invent chaotic action that is obviously only one of many solutions but plausibly interpolates between the two states? Maybe not in it's

* Fireworks can completely visually change in 2-3 frame timescales. The underlying process is immediately recognized and could easily be imagined by most people, does the model understand the context here?

Maybe it wouldn't do well right now, but how much better could performance get on the above if there were more exmaples in the training set?

* In the opposite direction of chaos, it might be interesting to look at something like two photos of a starry night taken 30 minutes apart. Based on the two photos, can the model understand the geometry of the scene and rotate the points correctly?

I would also be know what would happen if instead of using frame next to each other, you took them further and further apart. Of 1 second of a 30 FPS video, could you give it frames 1, 15, and 30, and ask it to find the other 27? How about with 5 seconds of a 30FPS video and giving it frames 1, 30, 60, 90, 120, and 150? Etc etc

It looks like they have a collab notebook, so perhaps I should quit writing and start playing around!

Yes, it would be good to see results with two images of fireworks.