|
|
|
|
|
by civeng
81 days ago
|
|
Great write-up. Thank you! I’m contemplating a similar RAG architecture for my engineering firm, but we’re dealing with roughly 20x the data volume (estimating around 9TB of project files, specs, and PDFs).
I've been reading about Google's new STATIC framework (sparse matrix constrained decoding) and am really curious about the shift toward generative retrieval for massive speedups well beyond this approach.
For those who have scaled RAG into the multi-terabyte range: is it actually worth exploring generative retrieval approaches like STATIC to bypass standard dense vector search, or is a traditional sharded vector DB (Milvus, Pinecone, etc.) still the most practical path at this scale? I would guess the ingestion pain is still the same. This new world is astounding. |
|
you could probably use the hybrid search in llamaindex; or elasticsearch. there is an off the shelf discovery engine api on gcp. vertex rag engine is end to end for building your own. gcp is too expensive though. alibaba cloud have a similar solution.