| TL;DR:
I have a hunch that demand for classic RAG (embeddings + vector DB) will shrink. Reasons: 1. Embedding ops cost (re-indexing, freshness) is high. 2. LLMs are getting good at iterative query expansion over plain search APIs (BM25-style). 3. Embedding quality is still uneven across domains/languages.
Curious what you are actually seeing in production. Context:
We’re a \~10-person team inside a large company. People use different UIs (ChatGPT, Claude, Dify, etc.). Cost/security aren’t our main issues; we just want higher throughput. We can wire MCP-style connectors (Notion/Slack/Drive) or run our own vector index—trying to pick battles that really move the needle. Hypotheses I’m testing: * For fast-changing corp knowledge, BM25 + LLM query expansion + light re-ranking beats maintaining a vector store (lower ops, decent recall). * MCP/API search gives “good enough” docs if you union a few expanded queries and re-rank. * Vectors still win for long-tail semantic matches and noisy phrasing—but only when content is relatively stable or you can afford frequent re-embeds. What I want from HN (war stories, not vendor pitches): 1. Have you sunset or avoided vector DBs because ops/freshness pain outweighed gains? What were the data size, update rate, and latency targets? 2. If you kept vectors, what made them clearly superior (metrics, error classes, language/domain)? Any concrete thresholds (docs/day churn, avg doc length, query mix) where vectors start paying off? 3. Anyone running pure API search + LLM query expansion (multi-query, aggregation, re-rank) at scale? How many queries per task? Latency/cost vs. vector search? 4. Hybrid setups that worked: e.g., API search to narrow → vector re-rank; or vector recall → LLM judge → final set. What cut false positives/negatives the most? 5. Multilingual/Japanese/domain jargon: where do embeddings still fail you? Did re-ranking (LLM or classic) fix it? 6. Freshness strategies without vectors: caching, recency boosts, metadata filters? What actually reduced “stale answer” complaints? 7. For MCP-style connectors (Notion/Slack/Drive): do you rely on vendor search, or do you replicate content and index yourself? Why? 8. If you’d start from scratch today for a 10-person team, what baseline would you ship first? Why I’m asking:
Our goal is throughput (less time hunting, more time shipping). I’m leaning to: * Phase 1: MCP/API search + LLM query expansion (3–5 queries), union top-N, local re-rank; no vectors.
* Phase 2 (only if needed): add a vector index for the failure cases we can’t fix with expansion/re-rank. Happy to share a summary of takeaways after the thread. Thanks! |