They say “standard” video is the source, so it would likely be on the order of 30 or 60 fps. Seems to be around a couple hundred frames, give or take, though I suspect it could get _something_ out of fewer frames, and more would just incrementally improve the model.
I would expect minor textural differences in a hand-drawn or painted source would make it a lot harder to correlate points between frames, but it’s an interesting idea to think about!
I would expect minor textural differences in a hand-drawn or painted source would make it a lot harder to correlate points between frames, but it’s an interesting idea to think about!