I find there's a quite large spread in ability between various models. Claude models seem to work superbly for me, though I'm not sure whether that's just a quirk of what my projects look like.
I don’t think it’s just a quirk. I’ve tested Claude across Java, Python, TypeScript and several other projects. The results are consistent, regardless of language or project structure, though it definitely performs better with smaller codebases. For larger ones, it really helps if you’re familiar with the project architecture and can guide it to the right files or modules, that saves a lot of time.