Hacker News new | ask | show | jobs
by dwrodri 668 days ago
Tangentially related to the post: I have what I think is a related computer vision problem I would like to solve and need some pointers on how you would go about doing it.

My desk is currently set up such that I have a large monitor in the middle. I'd like to look at the center of the screen when taking calls. I'd also like it to appear as though I am looking straight into the camera, and the camera is pointed at my face. Obviously, I cannot physically place the camera right in front of the monitor as that would be seriously inconvenient. Some laptops solve but I don't think their methods apply here as the top of my monitor ends up being quite a bit higher than what would look "good" for simple eye correction.

I have multiple webcams that I can place around the monitor to my liking. I would like to have something similar to what is seen when you open this webpage, but for a video. hopefully at higher quality since I'm not constrained to a monocular source.

I've dabbled a bit with OpenCV in the past, but the most I've done is a little camera calibration for de-warping fisheye lenses. Any ideas on what work I should look into to get started with this?

In my head, I'm picturing two camera sources: one above and one below the monitor. The "synthetic" projected perspective would be in the middle of the two.

Is capturing a point cloud from a stereo source and then reprojecting with splats the most "straightforward" way to do this? Any and all papers/advice are welcome. I'm a little rusty on the math side but I figure a healthy mix of Szeliski's Computer Vision, Wolfram Alpha, a chatbot, and of course perseverance will get me there.

5 comments

This is a solved problem on some platforms (Zoom and Teams), which alter your eyes so they look like they are staring into the camera. Basically you drop your monitor down low (so the camera is more centered on your head) and let software fix your eyes.

If you want your head to actually be centered, there are also some "center screen webcams" that exist that plop into the middle of your screen during a call. There are a few types, thin webcams that drape down, and clear "webcam holders" that hold your webcam at the center of your screen, which are a bit less convenient.

Nvidia also has a software package you can use, but I believe it is a bit fiddle to get setup.

> Some laptops solve but I don't think their methods apply here as the top of my monitor ends up being quite a bit higher than what would look "good" for simple eye correction.

I appreciate the pragmatism of buying another thing to solve the problem but I am hoping to solve this with stuff I already own.

I’d be lying if the nerd cred of overengineering the solution wasn’t attractive as well.

If you want overengineered and some street cred, instead of chaging the image to make it seem like you're looking in a new place, how about creating a virtual camera exactly where you want to look, from a 3D reconstruction??

Here's how I'd have done it in grad school a million years ago (my advisor was the computer vision main teacher at my uni)

If you have two webcams, you can put them on either side of your monitor at eye level (or half way up the monitor), do stereo reconstruction in real time (using e.g., opencv), create an artificial viewpoint between the two cameras and re-project the construction to the point that is the average of the two camera positions to create a new image. Then, feed that image to a virtual camera device. The zoom call connects to the virtual camera device. (on linux this might be as simple as setting up a /dev/ node)

It's much easier to reconstruct a little left / right of a face when you have both left and right images, than it is to reconstruct higher / lower when you have only above or below. This is because faces are not symmetric up/down.

This would work, it would be kinda janky, but it can be done realtime with modern hardware using cheap webcams, python, and some coding.

The hardest part is creating the virtual webcam device that the zoom call would connect to, but my guess is there's a pip for that.

Any imager would do, but quality would improve with:

* Synchronized capture - e.g., an edge triggered camera with, say, a rasp pi triggering capture

* Additional range information, say, from a kinnect or cell phone lidar

* A little delay to buffer frames so you can do time-series matching and interpolation

Have you seen the work done with multiple Kinect cameras in 2015? https://www.inavateonthenet.net/news/article/kinect-camera-a...

Creating a depth field with monocular camera is now possible, so that may help you get further with this.

One approach you could try is to use the webcam input to create a deepfake that you place onto a 3D model, one you can rotate around.

It should be doable real-time, but might be stuck in the uncanny valley.

Also maybe look at what Meta and Apple's Vision Pro are doing to create their avatars.

Nvidia broadcast does a pretty good job of deepfaking your eyes to look cat the camera... then again thats not as fun
FaceTime can do the eye correction.