|
|
|
|
|
by deepdarkforest
393 days ago
|
|
Terrible stuff and a reddish flag. First of all, gpt signs all over the blog post, reads like a bottom of the barrel linkedin post. But more importantly, why double and triple down on no RAG? As with most techniques, it has its merits in certain scenarios. I understand getting VC money so you have to prove differentiation and conviction in your approach, but why do it like this? What if RAG does end up being useful? You'll just have to admit you were wrong and cursor and others were right? I don't get it. Just say we don't believe RAG is as useful for now and we take a different approach. But tripling down on a technique so early into such a new field seems immature to me. It screams of wanting to look different for the sake of it. |
|
RAG is useful for natural text because there is no innate logic in how it's structured. RAG chunking based on punctuation for natural language doesn't work well because people use punctuation pretty poorly and the RAG models are too small to learn how they can do it themselves.
Source code, unlike natural text, comes with grammar that must be followed for it to even run. From being able to find a definition deterministically, to having explicit code blocks, you've gotten rid of 90% of the reason why you need chunking and ranking in RAG systems.
Just using etags with a rule that captures all the scope of a function I've gotten much higher than sota results when it comes to working with large existing code bases. Of course the fact I was working in lisp made dealing with code blocks and context essentially trivial. If you want to look at blub languages like python and javascript you need a whole team of engineers to deal with all the syntactic cancer.