Hacker News new | ask | show | jobs
by Q6T46nT668w6i3m 1458 days ago
Yes, look into instance or panoptic segmentation. The most popular method is a region-based network that jointly regresses bounding box coordinates alongside an object mask and class label.
1 comments

Thanks. The next step would be combining it with text-image foundation models such as clip https://github.com/openai/CLIP so that the model no longer depends on a limited set of predefined labels (coco…), right?

Also occlusion inference would be fantastic, so that we can select between the visible parts of the object and the whole shape (behind trees etc).

Exciting decade.