Hacker News new | ask | show | jobs
by ambuj_tripathi 105 days ago
First off, building a local LangGraph RAG at 18 is a massive achievement. Great work!

I recently challenged myself to architect a multi-agent LangGraph pipeline on an extremely constrained 512MB RAM free-tier server, so I totally understand your VRAM/RAM pain. Here is some architecture feedback based on your questions:

1. Small-VRAM Architecture (Parent-Child Chunking): If you are running out of memory, don't keep large text chunks in your Vector DB. Implement strict Parent-Child chunking. I only embed tiny 'child' chunks into Qdrant or vectordbs and store the large 'parent' text payloads in a lightweight DB like SQLite/Supabase. Search small, retrieve large.

2. Routing & Skipping Heavy Rerankers: I completely agree with using a single LLM call for routing to save compute. In my setup, I deliberately skipped heavy cross-encoder rerankers because they absolutely destroy free-tier/low-VRAM constraints (Use when you are not on resource constraints).

3. LangGraph Memory Management: Leverage LangGraph's state machine to avoid OOM crashes. Don't try to hold Evaluator and Generator contexts in memory simultaneously. Sequence your nodes with conditional edges (Generator -> Evaluator -> Route back if failed). By doing this sequentially, you never overload your VRAM at any single tick.

Keep building, you have a very solid foundation here!