| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nchmy 479 days ago

Fantastic insights again, thank you.

I've been developing a sort of educational platform for a few years and went deep on search stuff a couple years ago, before moving my focus elsewhere.

This was just prior to the LLM boom and I had concluded that it would probably be best to mostly avoid vector search, and instead use some sort of transformer model to extract summaries, keywords etc from each document, store them in a meta field, then just do normal, sparse bm25 on it all. Even for image search etc - just extract keywords rather then dense embeddings.

SPLADE was one promising example back then. https://github.com/naver/splade

I'm sure the field has progressed since then, but it sounds like it is still best to not invest in vector search.

The real lesson, it seems, is we need to know our needs, data, etc and act accordingly - most apparently do not do that.