Hacker News new | ask | show | jobs
by weavejester 238 days ago
"I’m not sure if anyone else feels this way, but with the introduction of generative AI, I don’t find coding fun anymore. It’s hard to motivate myself to code knowing that a model can do it much quicker. The joy of coding for me was literally the process of coding."

I experimented with GPT-5 recently and found its capabilities to be significantly inferior to that of a human, at least when it came to coding.

I was trying to give it an optimal environment, so I set it to work on a small JavaScript/HTML web application, and I divided the task into small steps, as I'd heard it did best under those circumstances.

I was impressed overall by how far the technology has come, but it produced a number of elementary errors, such as putting JavaScript outside the script tags. As the code grew, there was also no sense that it had a good idea of how to structure the codebase, even when I suggested it analyze and refactor.

So unless there are far more capable models out there, we're not at the stage where generative AI can match a human.

In general I find current model to have broad but shallow thinking. They can draw on many sources, which is extremely useful, but seem to have problems reasoning things through in depth.

All this is to say that I don't find the joy of coding to have gone at all. In fact, there's been a number of really thorny problems I've had to deal with recently that I'd love to have side-stepped, but due to the currently limitations of LLMs I had to solve them the old-fashioned way.

3 comments

It's so strange. I do all the things you mention and it works brilliantly well 10 times out of 11.
You are probably doing something others have done before frequently.

I find the LLMs struggle constantly with languages there is little documentation or out of date. RAG, LoRA and multiple agents help, but they have their own issues as well.

The OP was working on a "a small JavaScript/HTML web application"

This is a particular sweetspot for LLMs at the moment. I'll regularly one-shot entire NextJS codebases with custom styling in both Codex and Claude.

But it turns out the OP is using Copilot. That just isn't competitive anymore.

I'll see if I can run the experiment again with Codex, if not on the exact same project then a similar one. The advice I'm getting in the other comments is that Codex is more state of the art.

As a quick check I asked Codex to look over the existing source code, generated via Copilot using the GPT-5 agent. I asked it to consider ways of refactoring, and then to implement them. Obviously a fairer test would be to start from scratch, but that would require more effort on my part.

The refactor didn't break anything, which is actually pretty impressive, and there are some improvements. However if a human suggested this refactor I'd have a lot of notes. There's functions that are badly named or placed, a number of odd decisions, and it increases the code size by 40%. It certainly falls far short of what I'd consider a capable coder should be doing.

> and found its capabilities to be significantly inferior to that of a human, at least when it came to coding.

I think we should step back and ask: do we really want that? What does that imply? Until recently nobody would use a tool and think, yuck, that was inferior of a human.

I experimented with GPT-5 recently

GPT-5 what? The GPT-5 models range from goofily stupid to brilliant. If you let it select the model automatically, which is the case by default, it will tend to lean towards the former.

I was using GitHub Copilot Pro with VS Code, and the agent was labelled "GPT-5". Is this a particularly poor version of the model?

I also briefly tried out some of the other paid-for models, but mostly worked with GPT-5.

Try OpenAI Codex with GPT5-codex medium

The technology is progressing very fast, and that includes both the models and the tooling around it.

For example, Gemini 2.5 was considered a great model for coding when it launched. Now it is far inferior to Codex and Claude code.

The Githib Copilot tooling is (currently) mediocre. It's ok as a better autocomplete but can't really compete with Codex or Claude or even Jules (Gemini) when using it as an agent.

I'll try out Codex and see how that performs. Presumably I can just use OpenAI's Codex extension in VS Code?
Maybe, there are a few different things named "Codex" from OpenAI (yes, needlessly confusing) - "Codex" is a git-centric product, the other is the GPT-5-Codex agentic coder model. I recommend installing the Codex CLI if you're able to and selecting the model via `/model`.

  npm install -g @openai/codex
https://github.com/openai/codex
Frankly, yes.

The models are one part of the story. But the software around it matters at least as much: what tools does the model have access to, like bash or just file reading or (as in your example!) just a cache of files visited by the IDE (!). How does the software decide what extra context to provide to the model, how does it record past learnings from conversations and failed test runs (if at all!) and how are those fed in. And of course, what are the system prompts.

None of this is about the model; its all "plain old" software, and is the stuff around the model. Increasingly, that's where the quality differences lie.

I am sorry to say but Copilot is just sort of shoddy in this regard. I like Claude, some people like Codex, there are a bunch of options.

But my main point is - its probably not about the model, but about the products built on the models, which can vary wildly in quality.

In my experience with both Copilot and Claude, Claude makes subtler mistakes that are harder to spot, which also gobbles up time. Yes, giving it CLI access pretty cool and helps with scaffolding things. But unless you know exactly what you want to write, and exactly how it should work, to the degree that you will notice the footguns it can add deep in your structures, I wouldn't recommend anyone use it to build something professional.