|
|
|
|
|
by amirkabbara
264 days ago
|
|
At papr.ai, we've been building agentic 'RAG' pipelines for 3+ years and tried almost every new thing to enable agents to search info - keyword search/grep/regex/text2sql/bmi25/semantic vectordbs/knowledge graphs/etc. We validated the obvious thing - the best approach depends on your use case:
1. keyword/grep/regex/bmi25 works best (fastest, cheapest, most accurate) when you know exactly what you're looking for.
2. semantic search works best with unstructured data when you're not exactly sure what you're looking for.
3. text2sql works best when you have a few pre-defined queries with limited joins the agent can use to fetch structured data.
4. knowledge graphs works best when you need to find info across unstructured + structure data that go beyond semantics similarity (i.e. find arxiv reports by x author the discuss novel knowledge graph methods published in the past 3 years but don't mention neo4j). So - we ended up building a simple add/search api that predicts where the data should come from, and what the user needs this week/today and cache it. It's accurate and it's fast. |
|