Hacker News new | ask | show | jobs
by _tom_ 696 days ago
I'm interested but skeptical. I have not dived as deeply as you. Mostly I'm using ChatGPT. There is just no way it could generate 90% of my code. It is great at generating boilerplate for simple cases. I find it most useful for getting started on this I know nothing about. Like I was working with SVG recently, something I know nothing about. But is have to say chatgpt was helpful, but not, in the long run useful. Too many errors and its ability to refine answers is terrible. Too many attempts correction are met with cheerful fixes which have the same bugs.

Is anyone else actually getting good results for code generation using LLMs?

6 comments

Are you using the paid gpt-4? It is a world of difference of improvement over the free tier.

Like the author I am now writing 80% + of my code in chatgpt. Every now and then something pops up that it doesn't quite understand and I have to pick up my shovel and head back into the mines, but mostly with good prompting in chat gpt and preceding everything I write in my ide with a comment explaining what I'm doing copilot can do the rest.

It's a great tool in the way that google search once was, and programming IDEs are. But it takes some time to feel it out and see where it's useful and where it isn't, similar to learning how to google search and feeling out the opaque functionality in an ide.

At an AWS event last week there was a quote 'jobs aren't going to be replaced by ai. But people doing jobs without ai will be replaced by people with ai'.

Are you at least testing the code? Or are you delegating testing to AI as well?
I describe the test in natural language and AI writes the boilerplate. A typical workflow for adding a simple new backend endpoint to my company's web application will be:

1) Send a chat gpt request with the existing router code and describe in natural language the name of the new endpoint, what I need it to do, and any functions that I want it to use. 2) Read through it and check that the logic is ok. 3) Send a chat gpt request with the existing router tests and describe in natural language what endpoint I want to hit, and what I want the test to verify. 4) Check that the response makes sense and run the test. 5) Any errors either debug on my own, or iterate back and forth again with chat gpt.

We are still in the early stages of what it means to have an intelligent natural language system at our fingertips. For me, it means no longer really needing to bother with repetitive boilerplate code and test harnessing. This is a huge speedup for my professional workflow and a significantly more enjoyable coding experience.

I have not had good results and stopped trying. I have had some usable results, but on careful inspection there were subtle problems or needless convolutions that implied a different solution was being used than was actually the case. The sort of thing that works but is prone to misinterpretation by the next one working in the code.

Based on this I'm very against using it for things the user doesn't have significant knowledge of. Some coworkers seem to be having better success but I definitely get the sense they are reading and editing the results carefully. I don't find it that much if any of a productivity gain so I stopped trying for now.

> Some coworkers seem to be having better success but I definitely get the sense they are reading and editing the results carefully.

Yes, you need to consider the AI as if it were a junior programmer that sometimes makes mistakes. I use it for boring work that can be quickly checked. For example, the other day I asked for a 'give me next workday' algorithm based on the code structure I had, and it worked fine.

It's just one more tool in the toolbox.

If it's that straightforward I'd rather just write it. Like I said it hasn't been an overall time saver with the extra scrutiny I need to put it through. I'll try again in six months.

Also idk kinda tangent but you brought it up. I don't feel like my junior devs make easily found algorithmic mistakes like that. They're more likely to misjudge the scope of the problem or not be aware of a technical consideration or known solution. For that kind of work I'd rather... mentor a junior dev through it so they have the experience.

> Mostly I'm using ChatGPT. There is just no way it could generate 90% of my code (...) Is anyone else actually getting good results for code generation using LLMs?

Try to switch perspective from "write component ‘X’ and paste code without reading it" to "describe the problem, break it into smaller steps, generate code for each step, and iteratively work towards the final solution".

In the first case, LLMs can't do 90% of the work alone. In the latter case, it's different.

You could ask, "Okay, so that's still a lot of work generating code with LLMs," and you'd be right. But it's like having another programmer sitting next to you, helping tackle problems or time-consuming tasks, giving you more space to think about the actual problem.

So, “Up to 90% of my code is now generated by AI” doesn’t mean that only 10% of the entire software development process is left for humans. Writing code is just (obviously) one aspect of software development.

The second you get into a topic it doesn’t know, ChatGPT starts flailing around, unfortunately. I’ve had this happen in virtually anything of substance. Even if I paste in a whole tutorial and say “do this but in X language or with modification Y”.

Copilot on the other hand works great and is a huge improvement in productivity, probably because it’s only writing short snippets while I’m doing the algorithmic thinking. It reduces the brain -> keyboard lag substantially.

Claude 3.5 Sonnet is pretty good at working with small/isolated pieces of code (think single file). But it's not fast. I did manage to get it to make a full feature by itself, but it took an entire evening of copy pasting code/error messages. The final code quality was pretty good.

Most of the time I just use it for getting started on features, small function, and debugging.

CoPilot gets the percentage higher.