| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Ey7NFZ3P0nzAe 498 days ago

I recently built a semantic batching function for my RAG system [wdoc](https://github.com/thiswillbeyourgithub/wdoc/) that might be interesting to others. The system splits a corpus into chunks, finds relevant ones via embeddings, and answers questions for each chunk in parallel before aggregating the answers.

To optimize performance and reduce LLM distraction, instead of aggregating answers two by two, it does batched aggregation. The key innovation is in the batching order - I implemented a [semantic_batching function](https://github.com/thiswillbeyourgithub/wdoc/blob/18bc52128f...) that uses hierarchical clustering on the embeddings and orders texts by leaf order.

The implementation was straightforward, runs very fast and produces great results. The function is designed to be usable as a standalone tool for others to experiment with.