Hacker News new | ask | show | jobs
by julvo 2299 days ago
My guess would be U-Net-like ConvNets, trained on images annotated with foreground/background segmentations. Probably with all kinds of tricks like multi-scale inference etc.

However, simple frame-by-frame segmentation will probably not be enough to get temporal consistency, so for each frame's segmentation they probably take previous and following frames into account.

1 comments

That is incredibly insightful. For someone having no knowledge about this field, where would one start if he wants to remove the background from images using programming?
Depending on the type of image, a simple solution could be using OpenCV and some clever heuristics.

For a deep learning approach, I would start by looking into literature on semantic segmentation. Here is a blog post I just found which gives an intro: [1]

With state-of-the-art models (e.g. DeepLabV3) and a good dataset of foreground/background segmentations, the results could be of useful quality already.

The next step would be to look into literature on image matting (e.g. deep image matting [2]) which instead of trying to classify each pixel as foreground/background, regresses the foreground colour and transparency.

___

[1] https://divamgupta.com/image-segmentation/2019/06/06/deep-le...

[2] https://arxiv.org/abs/1703.03872

Thanks for the reply. This will make for a great weekend project.

I have some knowledge of creating an OCR program using deep learning from the last online course I took, but this looks like a very different beast and so it would be great fun to learn