| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gatkinso 527 days ago
	Have any of these LLM coding assistant enthusiasts actually used them? They are ok at autocomplete and writing types. They cannot yet write "80% of your code" on anything shippable. They regularly miss small but crucial details. The more 'enthusiastic' ones, like Windsurf, subtly refactor code - removing or changing functionality which will have you scratching your head. Programming is a process of building mental models. LLMs are trying to replace something that is fundamental to engineering. Maybe they'll improve, but writing the code was never the hard part. I think the more likely outcome of LLMs is the broad lowering of standards than 5x productivity for all.

1 comments

Inviz 527 days ago

I'm trying to use Cursor for full time building of applications for my personal need ( years of experience in the field). It can be rough, but you see the glimpses of the future. In the early weeks, I am constantly having to think of new ways to minimize surprise. Prompt-injection via dotfile (collection of Dos and Donts), forcing LLM to write itself jsdoc @instructions in problematic files, coming up with project overviews that states goals and responsibilities of different parts. It's all very flaky, but there's a promise. I think it is a good time to try mastering the LLM as a helper, and to figure out how to play your best game using this ultra sharp tool.

My 2c is that engineer will be needed in the loop to build anything meaningful. It's really hard to shape the play-dough of the codebase having to give up so much control. I see other people who can't validate the LLM BS get themselves into the hole really quickly when using multi-file editing capabilities like the Composer feature.

One thing that kinda blew me away even though it's embarrasingly predictable is "parallel edit" workflow, where LLM can edit multiple files at the same time using a shared strategy. It almost never works currently for anything non-trivial, so the agent almost never uses it unless you ask it to (Run typescript check and use parallel edit to fix the bugs). It's also dumb, using the same auto-completion model that doesnt have access to Composer context, it feels like. But this shows how we can absorb the tax of waiting for LLM to respond, is by doing multiple things at once to many files.

And tests. Lots of tests. There's no way about it, LLM can spice things up really subtly and without tests on a big project it's impossible to fix. LLMS dont have the same fear of old files from previous ages, they can start hacking what's not broken. Cursor has a capable "code review" system with checkpoints and quick rollbacks. At this point in time we have to review most of what LLM is spewing out. On the upside, LLM can write decent tests, extract failing cases, etc. There's no excuse not to use tests. However extra care needs to be taken with LLM-generated tests, as to review their legitimacy in asserts.

I tried to write a popular table game engine without understanding game rules personally, and i had a bunch of reference material. It was difficult when I couldnt validate the assertions. I think I failed the experiment and had to learn game rules by debugging failing tests one by one. And in many cases the culprit was the fixture generated by LLM. We can used combined workflows to make smarter models like o1-pro to validate the fixtures, that typically works on well known concepts and games

link

gatkinso 526 days ago

I use Cursor as well, totally understand what you mean about glimpses of the future. The composer and tab complete are good at writing isolated units of functionality - ie pure functions, especially when they are trivial and have been written before. I appreciate that it saves me a google search or time spent on rote work.

I do wonder how much more sophisticated LLMs would have to be to really excel at multi-file or crosscutting whole-codebase editing. I have no way of really quantifying this but I suspect its a lot more than people think, and even then might require a substantially different approach to making software. Kind of reminds me of the people trying to make full length movies with video AI - cool trick but who wants to sit through 1.5 hrs of that? The gulf between those videos and an actual feature length film is _massive_, much like the difference between a prod codebase and a todo crud app. And it's not just about the level of detail, it's about coherence between small but important things and the overall structure.

link

r3c0nc1l3r 527 days ago

I too use Cursor as my primary IDE for a variety of projects. This will sound cliche in the midst of the current AI hype, but the most critical component of getting good results with Cursor is properly managing the context and conversation length.

With large projects, I've found that a good baseline is about 300-500 lines of output, and around 12 500 line files as input. Inside those limits, Cursor seems to do pretty well, but if you overstuff the context window (even within the supposedly supported limits) Claude Sonnet 3.5 tends to forget important attributes of the input.

Another critical component is to keep conversations short. Don't reuse the same conversation for more than one task, and try to engineer prompts to minimize the amount/size of edits.

I definitely agree that Cursor isn't a silver bullet, but it's one of the best tools I've found for bootstrapping large projects as a solo dev.

link