|
|
|
|
|
by avernet
638 days ago
|
|
And how does repogather do it? From the README, it looks to me like it might provide the content of each file to the LLM to gauge its relevance. But this would seem prohibitively expensive on anything that isn't a very small codebase (the project I'm working on has on the order of 400k SLOC), even with gpt-4o-mini, wouldn't it? |
|
I think in the future as the cost of gpt-4o-mini level intelligence decreases, it will become increasingly worth it, even for larger repositories, to simply attend to every token for certain coding subtasks. I'm assuming here that relevance filtering is a much easier task than coding itself, otherwise you could just copy/paste everything into the final coding model's context. What I think would make much more sense for this project is to optimize the cost / performance of a small LLM fine-tuned for this source relevance task. I suspect I could do much better than gpt-4o-mini, but it would be difficult to deploy this for free.