|
|
|
|
|
by gatkinso
527 days ago
|
|
Have any of these LLM coding assistant enthusiasts actually used them? They are ok at autocomplete and writing types. They cannot yet write "80% of your code" on anything shippable. They regularly miss small but crucial details. The more 'enthusiastic' ones, like Windsurf, subtly refactor code - removing or changing functionality which will have you scratching your head. Programming is a process of building mental models. LLMs are trying to replace something that is fundamental to engineering. Maybe they'll improve, but writing the code was never the hard part. I think the more likely outcome of LLMs is the broad lowering of standards than 5x productivity for all. |
|
My 2c is that engineer will be needed in the loop to build anything meaningful. It's really hard to shape the play-dough of the codebase having to give up so much control. I see other people who can't validate the LLM BS get themselves into the hole really quickly when using multi-file editing capabilities like the Composer feature.
One thing that kinda blew me away even though it's embarrasingly predictable is "parallel edit" workflow, where LLM can edit multiple files at the same time using a shared strategy. It almost never works currently for anything non-trivial, so the agent almost never uses it unless you ask it to (Run typescript check and use parallel edit to fix the bugs). It's also dumb, using the same auto-completion model that doesnt have access to Composer context, it feels like. But this shows how we can absorb the tax of waiting for LLM to respond, is by doing multiple things at once to many files.
And tests. Lots of tests. There's no way about it, LLM can spice things up really subtly and without tests on a big project it's impossible to fix. LLMS dont have the same fear of old files from previous ages, they can start hacking what's not broken. Cursor has a capable "code review" system with checkpoints and quick rollbacks. At this point in time we have to review most of what LLM is spewing out. On the upside, LLM can write decent tests, extract failing cases, etc. There's no excuse not to use tests. However extra care needs to be taken with LLM-generated tests, as to review their legitimacy in asserts.
I tried to write a popular table game engine without understanding game rules personally, and i had a bunch of reference material. It was difficult when I couldnt validate the assertions. I think I failed the experiment and had to learn game rules by debugging failing tests one by one. And in many cases the culprit was the fixture generated by LLM. We can used combined workflows to make smarter models like o1-pro to validate the fixtures, that typically works on well known concepts and games