Hacker News new | ask | show | jobs
by yagami_takayuki 895 days ago
For recognizing features such as hair and skin color, which would do a better job? Machine learning with image classification? Or a vector database?

I've had Weaviate return a dog given a human as input when doing an image similarity search, so I was wondering if there's some way to improve the results or whether I'm barking up the wrong tree.

4 comments

For those two in particular? You'd definitely get a better result with an ML model such as a convolutional neural net. In some sense, using an image similarity query is a kind of ML model - nearest neighbor - which can work in some scenarios. But for this specifically, I'd recommend a CNN.
Thanks will experiment with a CNN.
I would think you could improve your embedding space to address that issue, partially. Similarity search (as a result of some contrastive loss) definitely suffers at the tails and the OOD is pretty bad. That being said, you're more likely to have higher recall than a more classical technique.
Thanks for your answer!
You could consider something like BLIP2. There are multiple ways you could use it: embed images and match them against embeddings of text descriptors, train custom embedding of descriptors, or a classifier on top of the embedding (linear layer on top of the image embedding network).

The approaches increase in complexity. It also allows for dataset bootstrap:

Let's say you want to classify cats by breed. You could start by embedding images and text descriptors and distance-matching the embedded descriptors to the images. This gives you a dataset that might be 90% correct; you can then clean it up, which would be easier to do than manually labelling it. Based on that improved dataset, you can train a custom embedding for the labels or a classification layer on top of the image embedding network.

thank you, will look into BLIP2.
You don't use vector databases independently, you need to input the embedding from a ML model.

For your use-case, it should be pretty simple. You could use a CNN and train it, or use YOLO, Deepface, or other face detection algos, then within the face, find the hair and find the skin.

From there you can use a vector database to get the colors that resemble other inputs, or you can use a simple CNN to classify the hair and skin to the closest label.

Thanks! Will looks into this.