|
|
|
|
|
by ambuj_tripathi
105 days ago
|
|
First off, building a local LangGraph RAG at 18 is a massive achievement. Great work! I recently challenged myself to architect a multi-agent LangGraph pipeline on an extremely constrained 512MB RAM free-tier server, so I totally understand your VRAM/RAM pain. Here is some architecture feedback based on your questions: 1. Small-VRAM Architecture (Parent-Child Chunking):
If you are running out of memory, don't keep large text chunks in your Vector DB. Implement strict Parent-Child chunking. I only embed tiny 'child' chunks into Qdrant or vectordbs and store the large 'parent' text payloads in a lightweight DB like SQLite/Supabase. Search small, retrieve large. 2. Routing & Skipping Heavy Rerankers:
I completely agree with using a single LLM call for routing to save compute. In my setup, I deliberately skipped heavy cross-encoder rerankers because they absolutely destroy free-tier/low-VRAM constraints (Use when you are not on resource constraints). 3. LangGraph Memory Management:
Leverage LangGraph's state machine to avoid OOM crashes. Don't try to hold Evaluator and Generator contexts in memory simultaneously. Sequence your nodes with conditional edges (Generator -> Evaluator -> Route back if failed). By doing this sequentially, you never overload your VRAM at any single tick. Keep building, you have a very solid foundation here! |
|