|
|
|
|
|
by jahooma
587 days ago
|
|
Yes. Natively, the models are limited to 200k tokens which is on the order of dozens of files, which is way too small. But Codebuff has a whole preliminary step where it searches your codebase to find relevant files to your query, and only those get added to the coding agent's context. That's why I think it should work up to medium-large codebases. If the codebase is too large, then our file-finding step will also start to fail. I would give it a shot on your codebase. I think it should work. |
|
The code extruded from the LLM is still synthetic code, and likely to contain errors both in the form of extra tokens motivated by the pre-training data for the LLM rather than the input texts AND in the form of omission. It's difficult to detect when the summary you are relying on is actually missing critical information.
Even if the set up includes the links to the retrieved documents, the presence of the generated code discourages users from actually drilling down and reading them.
This is still a framing that says: Your question has an answer, and the computer can give it to you.
1 https://buttondown.com/maiht3k/archive/information-literacy-...