The SAM-family of computer-vision models have made many of the human annotation services and tools somewhat redundant, as it's possible to achieve relatively high-quality auto-labeling of vision data.
This is probably true for simple objects, but there is almost certainly a market for hiring people who use SAM-based tools (or similar) to label with some level of QA. I've tried a few implementations and they struggle with complex objects and can be quite slow (due to GPU overhead). Some platforms have had some variant of "click guided" labelling for a while (eg V7) but they're not cheap to use.
Prompt guided labelling is also pretty cool, but still in infancy (eg you can tell the model "label all the shadows"). Seg GPT for example. But now we're right back to LLMs...
On labelling, there is still a dearth of high quality niche datasets ($$$). Everyone tests on MS-COCO and the same 5-6 segmentation datasets. Very few papers provide solid instructions for fine tuning on bespoke data.
That's basically what we are able to do now: showing models an image (or images, from video) and prompting for labels, such as with "person, soccer player".
Prompt guided labelling is also pretty cool, but still in infancy (eg you can tell the model "label all the shadows"). Seg GPT for example. But now we're right back to LLMs...
On labelling, there is still a dearth of high quality niche datasets ($$$). Everyone tests on MS-COCO and the same 5-6 segmentation datasets. Very few papers provide solid instructions for fine tuning on bespoke data.