Hacker News new | ask | show | jobs
by rckoepke 2128 days ago
Apparently all current attempts resulted in very, very uncanny valleys. This thread mentions some current attempts (searching hn.algolia.com for 'gaze correction' will return additional threads).

https://news.ycombinator.com/item?id=24151123https://news.yc...

2 comments

Even with multi-camera setups?
Seems possible. If the user is actually looking at the center of the screen then we only need to shift the view, not digitally move their eyes. That seems very doable with some GPU code.
> That seems very doable with some GPU code.

This seems about as hard as digitally moving eyes.

I think the main source of artifacts is going to be lighting and reflections. Specular color or reflections are only possible to see when light, surface position and normal, and observer are arranged in a specific way. If you have 2 or more cameras positioned elsewhere, there's no way to find out what color is visible to another camera in the center.

Modern AI can try to guess, but fundamentally there's no that info anywhere in the video. It can assume the object surface is made of small count of uniform materials, and extrapolate materials across picture and across frames, but this gonna fail too often for biologicals subjects like people.

Moving eyes means making decisions about human behavior, which is hard. Any weirdness will be very detectable. Just doing a 3D reconstruction with multiple cameras is more established field.
> Just doing a 3D reconstruction with multiple cameras is more established field

Yes, but that alone is not enough. You can indeed reconstruct 3D after spending enough resources, but that won’t help you finding out which color the camera is going to see, because of these reflection issues. Human eyeballs are very reflective. Even if you approximate them with spheres and distort the reflections accordingly, next subject will wear eyeglasses, the reflecting shape of these is arbitrary, you have no chance of doing that accurately enough.

The worst-case example is a person wearing eyeglasses which are completely flat on the outside. No matter how many cameras are around the screen, none of them will capture what would reflect in the eyeglasses for a missing camera at the center of the screen.

I think people will eventually solve that, not with AI postprocessing, with hardware. You can place a camera behind center of the screen, and split time between display and camera. For example, you light the display for 10ms, and for the next 6.66ms you turn off the display and instead read data from the camera. This will get you 60Hz of both display and camera.

Yeah, I've long thought this should be pretty doable. At least with a good TOF camera.

Most of the literature I've seen has been on specifically gaze correction, which isn't actually what you would want.

Not sure if you can still edit your comment, but you may want to put a space between those links.
For those on mobile: here are the two links, should they not become split above

https://news.ycombinator.com/item?id=24151123

https://news.ycombinator.com/item?id=24151123

They're the same link; presumably an accidental double-paste. Still useful to have it working though. :)