Hacker News new | ask | show | jobs
by Isn0gud 1863 days ago
Do these complex queries work on google photos, apple photos or whatever the state of the art image search is? With search, performance is key which makes anything O(n^2) unusable.
1 comments

For personal data search, the number of images that need to be searched is pretty small.

Typically rather than an indexing system, it's best to just do as much precomputation as possible so that a linear scan is fast. That scales up to 1M+ images/user.

Normally the approach taken is to preprocess all the images with a neural net (putting as input the image, metadata, some info from other images in the same location, same day, text from the web looked up from location coordinates, any other input that might answer a users query). Output an embedding vector of say 8192 elements.

Then when the query comes in, put it as input to some big pretrained language model with a fine tuned embedding layer to give another vector.

Then, for each image in the users account (1 million plus), run a tiny neural net to see if an image might be relevant. Such a network might only have a few thousand weights, and may only operate on part of the image and query vector. You'll probably want to use a GPU for this step, but it should work on a CPU too just about.

Take the top scoring few thousand images, and run a bigger comparison net for the final ranking.

You might want an extra input to the comparison net to give result diversity - ie. to try to avoid 50 very similar images all being returned at the top of the rankings.

Then all networks should be end-to-end trained on user behaviour - ie. the image users actually found that answered their query.

I think this is a nice approach. You may even be able to take it further; if you're training end to end based on users' queries, you can probably have the query and image representations in the same space and use a simple similarity measure in place of the tiny neural net (something like OpenAI's CLIP model).

The tricky part will be scaling it -- not just for speed, but keeping the index size down. Also, you'll need to already have some version of image search to collect the training data.