Hacker News new | ask | show | jobs
by maelito 487 days ago
Given a 3 648 318 tokens repository (number from Repomix), I'm not sure what would be the cost of using a leading LLM to analyse it and ask improvements.

Isn't the input token number way more limited than that ?

This is part is unclear to me in the "non-Greenfield" part of the article.

Iterating with aider on very limited scopes is easy, I've used it often. But what about understanding a whole repository and act on it ? Following imports to understand a Typescript codebase as a whole ?

1 comments

Well, do you as a human have the whole codebase loaded in to your memory with the ability to mentally reason with it? No, you work on a small scope at a time.
You may work in a limited scope at a time, but you are aware how it fits into the larger scope, and more often than not you actually have to connect things across different scopes.
I work on projects with hundreds of thousands of files and tens of millions of lines. I am honestly clueless how most of it fits together and working here as a human feels often not too different (just a few steps of scale higher) than LLM-based coding like Cursor.
Well you can use an LLM similarly. Have it write docs for all your files including a summary for each function / class, ideally in order of dependency. Then use only the summaries in context. This should significantly lower your token count.

Haven't tried it personally but it should work

In my experience, you often remember and/or discover relationships to other parts of the system during the current development task, by delving into the implementation. These relationships also aren't necessarily explicit in the code you're looking at. For example, they can relate to domain-level invariants or to shared resources, or simply shared patterns and conventions. In general you can't prepare everything that would be relevant up front.
you do the same thing with the llm, you have it describe the api of modules not related to your code and that in place of those segments of the code.
On larger codebases, I use tools heavily instead of just winging it. If the LLM can't either orchestrate those tools or apply its own whole-program analysis it quite impossible for it to do anything useful.