|
|
|
|
|
by anotherpaulg
705 days ago
|
|
I agree that many AI coding tools have rushed to adopt naive RAG on code. Have you done any quantitative evaluation of your wiki style code summaries? My first impression is that they might be too wordy and not deliver valuable context in a token efficient way. Aider uses a repository map [0] to deliver code context. Relevant code is identified using a graph optimization on the repository's AST & call graph, not vector similarity as is typical with RAG. The repo map shows the selected code within its AST context. Aider currently holds the 2nd highest score on the main SWE Bench [1], without doing any code RAG. So there is some evidence that the repo map is effective at helping the LLM understand large code bases. [0] https://aider.chat/docs/repomap.html [1] https://aider.chat/2024/06/02/main-swe-bench.html |
|
It seems like in a large repo, you'd want to have a summary of, say, each module, and what its main functions are, and allow the LLM to request repo maps of parts of the repo based on those summaries. e.g. in my website project, I have a documentation module, a client side module, a server side module, and a deployment module. It seems like it would be good for the AI to be able to determine that a particular request requires changes to the client and server parts, and just request those.