Hacker News new | ask | show | jobs
by isoprophlex 522 days ago
This was exactly my experience being the ML engineer on a predictive maintenance project. We detected broken traffic signs in video feeds from trucks; first you segment, then you classify.

Simply yeeting every "object of interest" into DINOv2 and running any cheap classifier on that was a game changer.

1 comments

Could you elaborate? I thought DINO took images and outputted segmented objects? Or do you mean that your first step was something like a yolo model to get bounding boxes and you are just using dino to segment to make the classification part easier?
We got bboxes from yolo indeed to identify "here is a traffic sign", "here is a traffic light" etc. Then we cropped out these objects of interest and took the DINOv2 embeddings of them.

Not using it to create segmentations (there are YOLO models that do that, so if you need a segmentation you can get it in one pass), no, just to get a single vector representing each crop.

Our goal was not only to know "this is a traffic sign", but also do multilabel classification like "has graffiti", "has deformations", "shows decoloration" etc. If you store those it becomes pretty trivial (and hella fast) to pass these off to a bunch of data scientists so they can let loose all the classifiers in sklearn on that. See [1] for a substantially similar example.

[1] https://blog.roboflow.com/how-to-classify-images-with-dinov2

Understood. Thanks for taking the time to elaborate.