|
|
|
|
|
by Jcampuzano2
492 days ago
|
|
I usually use Claude 3.5 sonnett since its still the one I've had my best luck with for coding tasks. When it comes to 10k LOC codebases, I still don't really trust it with anything. My best luck has been small personal projects where I can sort of trust it to make larger scale changes, but larger scale at a small level in the first place. I've found it best for generating tests, autocompletion, especially if you give context via function names and parameter names I find it can oftentimes complete a whole function I was about to write using the interfaces available to it in files I've visited recently. But besides that I don't really use it for much outside of starting from scratch on a new feature or getting helping me with getting a plan together before starting working on something I may be unfamiliar with. We have access to all models available through copilot including o3 and o1, and access to chatgpt enterprise, and I do find using it via the chat interface nice just for architecting and planning. But I usually do the actual coding with help from autocompletion since it honestly takes longer to try to wrangle it into doing the correct thing than doing it myself with a little bit of its help. |
|
It's when I try to give it a clear, logical specification for a full feature and expect it to write everything that's required to deliver that feature (or the entirety of slightly-more-than-non-trivial personal project) that it falls over.
I've experimented trying to get it to do this (for features or personal projects that require maybe 200-400 LOC) mostly just to see what the limitations of the tool are.
Interestingly, I hit a wall with GPT-4 on a ~300 LOC personal project that o3-mini-high was able to overcome. So, as you'd expect - the models are getting better. Pushing my use case only a little bit further with a few more enhancements, however, o3-mini-high similarly fell over in precisely the same ways as GPT-4, only a bit worse in the volume and severity of errors.
The improvement between GPT-4 and o3-mini-high felt nominally incremental (which I guess is what they're claiming it offers).
Just to say: having seen similar small bumps in capability over the last few years of model releases, I tend to agree with other posters that it feels like we'll need something revolutionary to deliver on a lot of the hype being sold at the moment. I don't think current LLM models / approaches are going to cut it.