Hacker News new | ask | show | jobs
by AlexanderNull 830 days ago
Neither Github Copilot nor GPT4 are worth your time. At best they partially guess the name of a function you're thinking about typing, at worst they give you almost a correct answer. I've been shocked by how close those models will get to almost understanding what you're attempting to do, while still fundamentally getting it wrong. Last month, after a while of realizing I was spending more time correcting suggestions than I was saving I stopped using them and will need to see some major improvements before I can feel comfortable using them again.
4 comments

This resonates with my recent experience using Bard (not the latest version of Gemini). It would produce something that initially seemed surprisingly good, but then when I actually tried to run it it turned out to be totally broken. I'd ask it to fix the error; it would magically do so but then be broken in a different way. It felt like pair programming with a junior programmer who just didn't quite get it.

This was just me interacting through text prompts. I could imagine some kind of more integrated solution where you can provide some basic test cases, and the system would run those cases through code proposed by the LLM could go a long way towards improving this.

For now it seems mainly useful as a way of getting a quick first draft of some code which I then have to fix up and get fully working myself.

I see a whole lot of potential in these tools, and in some domains they are starting to deliver on some of the promise. But by and large I agree with this statement - they're actually costing me time because I have to do the research to see where they went wrong. I'm better off learning it properly and idiomatically from scratch right now.
I don't get this at all. What kind of code are you writing that you have to literally go and research what it spat out?

In my experience 90% of code is 90% the same as another piece of code in the same repo, with small differences, and copilot will make you fly writing that code.

If you can't read the output code, does it mean the rest of your codebase is similarly unreadable?

The complexity in a codebase or a system is usually from different parts integrating or an overall architecture, but that's totally different to an individual function

"What kind of code are you writing that you have to literally go and research what it spat out" - so in a recent case I was trying to work with Elasticsearch. I'm not an expert in that, so I asked it to do some things. It hallucinated a bunch, and I ended up having to dive in and learn it deeply anyway.

In that case I think I was better off not relying on the tool. I do find it nice to steer me in a direction, but the things I use tend to be niche enough that I don't get the benefit many others do.

I also have a feeling you and I are using it in different capacities.

This can happen when you are working with new codebases or APIs. For example, recently I tried to build this small gnome extension [0] but I had 0 experience with the API. So I tried chat gpt.

Even though the structure of the code in the file was ok, it called some APIs that did not exist, it created a new var `this._menu` for the dropdown that was not needed (this.menu already exists) and in the end I still had to go through gnome extensions docs to figure out how to do it right.

Overall I don't regret using it but the experience wasn't magical, as I guess we all want it.

[0]: https://github.com/onel/keyboard-cat-defense

Agreed that Github Copilot is not worth anyone's valuable time. You should check out the new Claude 3 Opus model, it's noticeably better. Right off the bat, it's less 'lazy' with its generations, and for me, it has been able to solve bugs that GPT-4 could not solve.

Just this week we made it available on https://double.bot (VS COde Copilot extension). Have been getting similar feedback from multiple users

I hadn't explicitly asked that, but that is what I've been curious about too, as in are these, currently, good enough. Which is why I wanted to start with whichever is the best, again currently, rather than bang my head against ones that are not good.

I had tried a simple experiment to generate some basic Go REST service, XML parsing code, using Bard and ChatGPT and they were actually not bad. But, that was a very simple and new code.