Hacker News new | ask | show | jobs
ML turns video of a 360° turn into 3D model of a person (sciencemag.org)
96 points by mikeyanderson 2988 days ago
10 comments

Oh how I hate marketing speech.

First of all, the title should include "video of a predefined 360° turn".

And then they say something along the lines of "average accuracy of about 5mm" for joining the constructed modeled joints to their model, while you see the body wobbling around happily.

This is an impressive demo, but gah!

Ok, we'll give it a 360° turn above.
Structure from motion is an existing technique. What is the contribution of ML in this case (it seems like joint positioning maybe?)?

https://en.m.wikipedia.org/wiki/Structure_from_motion

99%¹ of computer vision problems are 80% solved. The problem is, you need 95+% solution to be practically useful.

Binocular stereo vision has just approached general applicability, and SfM is mostly used in very constrained environments (traffic analysis) or with large computational resources with manual correction (offline 3D mapping from aerial data).

¹ Numbers are metaphoric only, based on experience in scientific and industrial CV.

SFM does not automatically provide joint locations. Also, a casual 360 video around a subject does not provide enough data for producing a full body mesh.
How is this ML? They use a CNN for foreground segmentation, a minor step in their pipeline. But the major contribution seems to be putting the silhouettes in a common reference frame. I sincerely hope sciencemag isn’t putting ML in the title purely to jump on the bandwagon.
What a farce. The use of the ML for background subtraction is almost inconsequential to the contribution of the paper and the result.
It's someone standing in front of a green screen. You don't need ML to find a person's silhouette.
To be fair, they do have examples that aren’t chroma keyed; they just lead with one that is.

Which is not to say that ML is necessary for this sort of computer vision task, but I wonder if it yields better or sharper results than other techniques?

Same. As someone who has spent an embarrassing amount of time keying and tracking video footage over the years, I’m surprised ML isn’t being used for this more often in studios by now.
As an artist, my first thought is I wonder what happens if you try giving this a series of drawings.
you'd probably need a lot of drawings, I wonder what's the sampling rate the thing uses

it's a cool idea though :)

They say “standard” video is the source, so it would likely be on the order of 30 or 60 fps. Seems to be around a couple hundred frames, give or take, though I suspect it could get _something_ out of fewer frames, and more would just incrementally improve the model.

I would expect minor textural differences in a hand-drawn or painted source would make it a lot harder to correlate points between frames, but it’s an interesting idea to think about!

This is what should give you pause before using face authentication technology for anything.
Can you elaborate?
Makes forging facial biometrics easier.
In the case of Face ID, at least, you’d still have to transfer the measurements into the physical world, in a way that fools a system that has ostensibly been designed not to be fooled by masks.
Like a 3D printed model?
Doesn’t work for high quality face reco systems like iPhone X. You’d also need to get the IR reflectance, as well as a sign of life from the eyes.
I wonder if will see a future soon where a director can fully edit the positions and physical actions of the actors at post production.

basically, the whole scenes will be transferred to believable 3d models seemlessly, and you can reanimate parts of everything. I feel like that's doing to happen for sure, for big Hollywood productions at least (like the Marvel stuff)

This already happens a lot, most VFX heavy productions will have digital doubles of the main cast, and they can be used for as simple a reason as reframing a shot.
Your comment could give the impression this is drastically more simple to do than it is in reality. This is considered as something like the last frontier of VFX, and there still remains a lot of work to be done.

While you’re essentially correct, it is currently an overwhelmingly manual process. The amount of work and time necessary is substantial (some would say outrageous), and exponentially higher for certain types of shots. Many shots remain impossible or cost-defeating.

It seems determined to put visible toes on everybody, no matter that they're wearing socks.

Is this a bug or a feature?

I'm going to guess they start with a generic human model that includes all limbs and extremities and then the "machine learning" process attempts to fit that model to the silhouettes extracted from the video.
Which implies that the technique uses domain knowledge of people to make assumptions about their morphology.
This is awesome. I wish someone will implement this as a piece of open source software. Imagine the potential!
Source code seems to be available :)

https://graphics.tu-bs.de/people-snapshot

From site: "We will provide access to the code and dataset soon."
Could be used for VR phone calls between long distance couples.