Hacker News new | ask | show | jobs
by Immortal333 2147 days ago
This is great. This is already better that FLANN (by facebook), Annoy (by spotify). https://github.com/erikbern/ann-benchmarks#evaluated

Still, We are very far away from creating reverse video search engine. Also, Most of these techniques are in-memory not on disk.

I am really interested in making reverse video search engine.

5 comments

Not quite the same thing, but videogrep https://github.com/antiboredom/videogrep works along, audio -> ASR -> timed transcript -> text search, lines.
I'm working on an ANN plugin for Elasticsearch. All data is stored on disk, you automatically get horizontal/distributed scaling handled by ES, and you can combine ANN queries with Elasticsearch queries. http://elastiknn.klibisz.com/

It's currently not as fast as the in-memory alternatives. Though it's not a perfect apples/apples comparison. Data is stored on disk, it's a JVM implementation rather than C/C++, and it's optimized for single queries rather than bulk.

YSK that ANN is already in the process of being added to Lucene:

https://github.com/elastic/elasticsearch/issues/42326

A different implementation is already in OpenDistro for Elasticsearch:

https://opendistro.github.io/for-elasticsearch-docs/docs/knn...

Yep, I'm aware.

The Lucene implementation seems early and slow-moving. Seems they are trying to create new storage formats and use graph-based search methods. OpenDistro wrapped a C++ binary that also uses a graph-based method. It works quite well, but only for L2 similarity and comes with the operational burden of running a rather large sidecar process completely disjoint from the JVM.

The approach I've taken is to support five similarity functions (L1, L2, Angular, Jaccard, Hamming), support sparse and dense vectors, implement everything inside the JVM with no sidecar processes and no changes to Lucene, and to use hashing-based search methods (i.e. LSH). IMO the last point has a clear advantage over using graph-based methods, because the hashes are treated just like regular text tokens which is clearly the optimal access pattern in ES/Lucene. Of course it will likely lose to a C++ implementation in terms of raw speed because it's the JVM, but IMO that matters less than making the plugin trivial to run and scale.

I don't think there's a definitively better approach yet. It's an interesting problem and it'll be interesting to see what ends up working well.

MPEG-7 has been working on this for decades. ffmpeg has an implementation you can use: https://ffmpeg.org/ffmpeg-all.html#signature-1 While this does pairwise comparisons, it should be possible to simply index the resulting fingerprints using lucene and perform inverted index search - this should scale much better than this technique mentioned.
How would a reverse video search engine work? You give it a clip and it shows you which longer clips it is in?
I would imagine it using a Shazam-like algorithm, only for downsampled video and not just the audio.

A decade ago there was some hobby implementations of these algorithms that got cease and desist letters form Shazam and that was all over HN.

https://m.youtube.com/watch?v=8Dj0rekeM7g

Yes, Exactly same as you described.
Could you take (embeddings of) frames from your query video and measure the pairwise similarity to frames from the reference videos, then rank by some summary function of that list of similarity scores?

What are the bottlenecks besides performance?

FLANN was developed at UBC. You meant FAISS.
Yes, FAISS not FLANN. my mistake. cann't correct it now. because edit time is gone.