|
|
|
|
|
by julvo
2299 days ago
|
|
My guess would be U-Net-like ConvNets, trained on images annotated with foreground/background segmentations. Probably with all kinds of tricks like multi-scale inference etc. However, simple frame-by-frame segmentation will probably not be enough to get temporal consistency, so for each frame's segmentation they probably take previous and following frames into account. |
|