Hacker News new | ask | show | jobs
by no_wizard 328 days ago
Corpus also matters. I know Rust developers who aren't getting very good results even with high quality prompts.

On the other hand, I helped integrate Cursor as a staff engineer at my current job for all our developers (many hundreds), who primarily work in JavaScript / TypeScript, and even middling prompts will get results that only require refactoring, assuming the LLM doesn't need a ton of context for the code generation (e.g. greenfield or independent features).

Our general approach and guidance has been that developers need to write the tests first and have Cursor use that as a basis for what code to generate. This helps prevent atrophy and over time we've find thats where developers add the most value with these tools. I know plenty of developers want to do it the other way (have AI generate the tests) but we've had more issues with that approach.

We discourage AI generating everything and having a human edit the output, as it tends to be slower than our chosen approach and more likely to have issues.

That said, LLMs still struggle if they need to hold alot of context. For instance, if you have a bunch of files that it needs to understand to also generate code that is worthwhile, particularly if you want it to re-use code.

1 comments

>Corpus also matters. I know Rust developers who aren't getting very good results even with high quality prompts.

Which model were they using, out of interest? I've gotten decent results for Rust from Gemini 2.5 Pro. Its first attempt will often be disgusting (cloning and other inefficiencies everywhere), but it can be prompted to optimise that afterwards. It also helps a lot to think ahead about lifetimes and explicitly tell it how to structure them, if there might be anything tricky lifetime-wise.

No idea. I do know they all have access to Cursor and tried different models, even the more expensive options.

What you're describing though, having to go through that elaborate detail really drives to my point though, and I think shows a weakness in these tools that is a hidden cost to scaling their productivity benefits.

What I can tell you though both from observation and experience, is that because the corpus for TypeScript / JavaScript is infinitely larger as it stands today, even Gemini 2.5 Pro will 'get to correct' faster even with middling prompt(s) vs for a language like Rust.

I do a lot of work in a rather obscure technology (Kamailio) with an embedded domain-specific scripting language (C-style) that was invented in the early 2000s specifically for that purpose, and can corroborate this.

Although the training data set is not wholly bereft of Kamailio configurations, it's not well-represented, and it would be at least a few orders of magnitude smaller than any mainstream programming language. I've essentially never had it spit out anything faintly useful or complete Kamailio-wise, and LLM guidance on Kamailio issues is at least 50% hallucinations / smoking crack.

This is irrespective of prompt quality; I've been working with Kamailio since 2006 and have always enjoyed writing, so you can count on me to formulate a prompt that is both comprehensive and intricately specific. Regardless, it's often a GPT-2 level experience, or akin to running some heavily quantised 3bn parameter local Llama that doesn't actually know much of anything specific.

From this one, can conclude that a tremendous amount of reinforcement for the weights is needed before the LLM can produce useful results in anything that isn't quasi-universal.

I do think, from a labour-political perspective, that this will lead to some guarding and fencing to try to prevent one's work-product from functioning as free training for LLMs that the financial classes intend to use to displace you. I've speculated before that this will probably harm the culture of open-source, as there will now be a tension between maximal openness and digital serfdom to the LLM companies. I can easily see myself saying:

I know our next commercial product (based on open-source inputs) releases, which are on-premise for various regulatory and security reasons, will be binary-only; I have never customers looking through our plain-text scripts before, but I don't want them fed into LLMs for experiments with AI slop.