Hacker News new | ask | show | jobs
by glalonde 2856 days ago
What about pose estimation? e.g. Given a well defined coordinate system, like the origin is the nose on a face, determine the pose of the face. Is this still best done with classic optimization formulations like ransac/ICP and a supplied model, or have these been bested by learned models somehow?
1 comments

Don't think it's exactly what you're talking about (I'm sure there are other works much closer to what you have in mind, just can't recall off the top of my head) — but you might find PoseNet (https://www.cv-foundation.org/openaccess/content_iccv_2015/p...) interesting. Not explicitly 3D, but estimates where in a large-scale scene a picture was taken using an end-to-end convolutional network.

With that said, I think there's still a ton of merit in classical geometric approaches like ICP — there's a real, geometric basis to why they work. Convolutional networks can demonstrate some pretty amazing results, but they're still mostly "black boxes" to us, and a consequence of this is that it's hard to understand why they work and predict when they'll fail. This blog post (by the PoseNet author, actually) articulates the viewpoint well: https://alexgkendall.com/computer_vision/have_we_forgotten_a.... One recent research direction that I personally find really fascinating is designing deep learning architectures around real geometric properties, e.g. as in Skydio's deep stereo work: https://arxiv.org/pdf/1703.04309.pdf

PoseNet on Tensorflow.js does nice head tracking. One can get rough head pose from nose/eyes/ears. but it's crufty.

[1] Web-browser demo: https://storage.googleapis.com/tfjs-models/demos/posenet/cam... [2] Github: https://github.com/tensorflow/tfjs-models/tree/master/posene...