Hacker News new | ask | show | jobs
by siamese_puff 657 days ago
Not an expert, but OP is right and this is generally a known issue with large windows and RAG. Small chunks are usually best. Also how you chunk is important. OP - what’s the most optimal way to parse/chunk code snippets?
1 comments

You can use the AST to chunk the code: https://docs.sweep.dev/blogs/chunking-2m-files
We're using an improvement over this exact blogpost actually. We started from there, but weren't happy that some of the chunks were really small (and they would undeservedly get surfaced to the top). So we added some extra logic to merge the siblings if they're small.

https://github.com/Storia-AI/repo2vec/blob/1864102949e720320...