Hacker News new | ask | show | jobs
by ghita_ 580 days ago
That's cool! How does it perform compared to more "naive" methods? How did you go about comparing that performance, and was it in a real world RAG?
1 comments

Yep benchmarks are available at https://github.com/ZeroEntropy-AI/llama-chunk?tab=readme-ov-... , we used this dataset https://github.com/ZeroEntropy-AI/legalbenchrag which is a retrieval-focused version of LegalBench.

It scored better than LlamaIndex's recursive character text splitter and that was including some custom regex work to improve it. If you put enough effort into the regex you could probably get there, but the whole point of the agentic chunking is for it to be automatic and contextual.