Hacker News new | ask | show | jobs
by hannah-pdx 1400 days ago
As part of the development of a system that requires searching by image, we needed to compute feature vectors for use with the Pinecone vector database. All the research we could find focused on either ML approaches, which were untenable due to hopes to perform vector generation in the browser, or hamming distance vector comparison, which are untenable for large scale search.

The README here contains my research into several algorithms' performance, and the repo contains the code that performed the data gathering.

The site alt-text.org is still alpha quality and under active development, and the library backing it is quite small so most searches will fail, but feel free to play around with it.

Twitter users can help build the library with the link in the upper right corner, though it does not yet work on mobile.

1 comments

Why is vector generation in the browser untenable with ML approaches? There are tiny models and tensorflow.js
Hmm, the smallest model I see is still 4.3MB, are there smaller?

My quick read of the stats offered says that the smaller model's accuracy suffers considerably. I could definitely see running similar tests on it though!

All that said, the matching tensorflow offers from my understanding is also not exactly what I'm after. I'm primarily concerned with matching identical-to-humans images, possibly with small modifications such as size changes. Think more "are these two images identical" vs "give me pictures of dogs"

There is the metric learning problem to learn a hash for similarity https://github.com/tensorflow/similarity

That said, I don't see many good models available for download on tfhub or huggingface optimized for it, but you can always programmatically modify your images (if you truly mean identical to humans) - change white balance, crop, rotate, select adjacent frames from videos, etc. and optimize a network that is small enough for you to be satisfied and see if that works, as a possible alternative.