Hacker News new | ask | show | jobs
by alsodumb 688 days ago
I might be minority, but I am not that surprised by the results or the not so significant GPU hours. I've been video segment tracking for a while now using SAM for mask generation and some of the robust academic video-object segmentation models (see CUTIE: https://hkchengrex.com/Cutie/ presented at CVPR this year.)for tracking the mask.

I need to read SAM2 paper, but 4. seems a lot like what Rex has in CUTIE. CUTIE can consistently track segments across video frames even if they get occluded/ go out of frame for a while.

2 comments

Seems like there's functional overlap between segmentation models and the autofocus algorithms developed by Canon and Sony for their high-end cameras.

The Canon R1 for example will not only continually track a particular object even if partially occluded but will also pre-focus on where it predicts the object will be when it emerged from being totally hidden. It can also be programmed by the user to focus on a particular face to the exclusion of all else.

Of course Facebook has had a video tracking ML model for a year or so - Co-tracker [1] - just tracking pixels rather than segments.

[1] https://co-tracker.github.io/