Hacker News new | ask | show | jobs
by 9dev 697 days ago
As someone working extensively with word2vec: I would recommend to set up Elasticsearch. It has support for vector embeddings, so you can process your PDF documents once, write the word2vec embeddings and PDF metadata into an index, and search that in milliseconds later on. Doing live vectorisation is neat for exploring data, but using Elasticsearch will be much more convenient in actual products!
1 comments

I would personally vote for Postgres and one of the many vector indexing extensions over Elasticsearch. I think Elasticsearch can be more challenging to maintain. Certainly a matter of opinion though. Elasticsearch is a very reasonable choice.
Elasticsearch is a pain to maintain, the docs are all over the place, and the API is what you end up with if developers run free and implement everything that jumps to their mind.

But there just isn’t anything comparable when it comes to building your own search engine. Postgres with a vector extension is good if all you want to do is a vector search and some SQL (not dismissing it here, I love PG); but if you want more complex search cases, Elasticsearch is the way to go.