I just discovered that Zoom teleconferencing software has the ability to detect the background and replace it with an arbitrary video in real time. I have no idea how they do it, but it's as impressive as heck.
Yes, it did not work for me sitting with a laptop on a swing chair. It actually works better with clear static features in the background, but it seams to be a bit more (cnn?) because it seams to isolate paintings on the wall for me as well. Wish there was more open source easy to use camera stream manipulation. Skype also has background blurring.
It is a good early version of such a feature, but I've found it to fail fairly quickly in uneven light or with busy backgrounds. Still, it is a great feature, so I'm hopeful that they'll keep improving it and other services will match it.
It’s likely some type of segmentation network like Mask RCNN or SegNet. Look up Mask RCNN for some state of the art segmentation results. This stuff has been able to run on mobile phones in real time for years now.
Jitsi has this feature in beta too. It works pretty well, although it currently uses way too much CPU, and doesn't work quite as well as it does in MS Teams.
- a background scene is usually static. if a pixel matches its historical average (i e. not changing) then it's probably background.
- if a pixel color matches its neighbors, it's probably the same as them. i noticed my black shirt tripped it up on occasion, but was all-or-nothing.
- people are blobs, not diffuse, so try to segment large regions.