Hacker News new | ask | show | jobs
by samlinnfer 582 days ago
How does it work for code? (Chunking code that is)
2 comments

Poorly, just like it does for text.

Chunking is easily where all of these problems die beyond PoC scale.

I’ve talked to multiple code generation companies in the past week — most are stuck with BM25 and taking in whole files.

What do they use BM25 for? RAG?
Correct -- finding the correct functions and files to include
Right now, we haven't worked on adding support for code -- some things like comments (#, //) have punctuations that adversely affect chunking, along with indentation and other issues.

But, it's on the roadmap, so please hold on!