| > One, your projects are small enough that you can reasonably provide enough context for the language model to be useful. Two, you’re using the most common languages in the training data. Three, because of those factors, you’re willing to put much more work into learning how to use it effectively, since it can actually produce useful content for you. Take a look at the 2024 StackOverflow survey. 70% of professional developer respondents had only done extensive work over the last year in one of: JS
64.6%
SQL
54.1%
JTML/CSS
52.9%
PY
46.9%
TS
43.4%
Bash/Shell
34.2%
Java
30% LLMs are of course very strong in all of these. 70% of developers only code in languages LLMs are very strong at. If anything, for the developer population at large, this number is even higher than 70%. The survey respondents are overwhelmingly American (where the dev landscape is more diverse), and self-select to those who use niche stuff and want to let the world know. Similar argument can be made for median codebase size, in terms of LOC written every year. A few days ago he also gave Gemini Pro 2.5 a whole codebase (at ~300k tokens) and it performed well. Even in huge codebases, if any kind of separation of concerns is involved, that's enough to give all context relevant to the part of the code you're working on. [1] [1] https://simonwillison.net/2025/Mar/25/gemini/ |
But really that’s the vision of actual utility that I imagined when this stuff first started coming out and that I’d still love to see: something that integrates with your editor, trains on your giant legacy codebase, and can actually be useful answering questions about it and maybe suggesting code. Seems like we might get there eventually, but I haven’t seen that we’re there yet.