Hacker News new | ask | show | jobs
by jchw 1106 days ago
My problem with GPT-4 so far is that it's most impressive at doing things that I don't really need any help with, but when I ask it something that I find genuinely challenging, it's error prone and often generates suboptimal code. Of course, it's an extreme far cry from Markov chains and earlier models, but at the end of the day, I think it's skipping too many steps right now.

To be clear, it's still really impressive. Being able to ask an AI assistant to implement e.g. a Fourier transform in Rust and getting an answer that is correct or quite close is pretty damn impressive. That's something that a lot of programmers would struggle with. But, in a lot of other ways, GPT-4 feels very skin deep, and the illusion is pierced when you realize it has just made a bunch of shit up in its impressive-sounding answer.

2 comments

> My problem with GPT-4 so far is that it's most impressive at doing things that I don't really need any help with

That's actually what I like about it. It's pretty good at doing routine tasks that I easily could do, but which are boring and/or time-consuming. That frees me up to do more interesting stuff.

One example I've given here before is porting an extension I wrote for the Atom text editor to VS Code. I have no doubt that I could learn to write extensions for VS Code, but using ChatGPT meant that I didn't have to waste hours learning how to do that. I just took the skeleton code generated by ChatGPT, dropped in my pre-existing code, and boom... done. Since learning to write VS Code extensions isn't my primary goal (or a secondary goal, or even a tertiary goal), that was a clear win for me.

I put off switching to VS Code for far too long after Atom was EOLed, just because I didn't want to take the time to learn how to port my extension.

I think things you don't need help with are exactly the good case where it shines. You give it some context, and get back working (or close to it) code - make any modifications necessary and ask it to generate unit tests. It's easy to validate and saves you 15-30 mins or more.

Half the time I ask it for help with something completely foreign and I can't properly diagnose or fix the issues - without taking the time to read the docs and/or looking it up on Reddit or stackoverflow. The other half of the time it works perfectly-ish.

I don't know about you guys - but a lot of shit I write is not a world-changing problem - if it's something already asked on stackoverflow - you basically get a code block that does what you need, with your variables and comments and unit tests already pretty much done.

I've definitely given it a try for some of that, but the truth is I just don't have that many greenfield projects. Most of my work is maintaining existing code, and CoPilot/GPT-4 haven't proven as useful for me at that.

When it comes to new, greenfield projects, I really do try to be ambitious generally speaking, and this leads me into trouble. I gave it a try to see if it could generate code for something where I needed just a slightly tricky data structure maneuver. It was a nicely self-contained problem, pretty much ideal for this sort of thing, but it still generated code that didn't compile and wasn't optimal.

To be fair, it was kinda close, which is great. But overall it wasn't worth the time: I'm pretty slow, but most of my time is NOT in trying to implement basic things, it's trying to come up with what I need to implement in the first place. So I definitely tried spitballing with GPT-4 a little bit to try to get an idea of whether or not that was a good idea. Truth is, it's a bit hit-or-miss.

Here's my take:

- For programming tasks, it's just skipping too much stuff. I think the future for LLMs doing complex tasks is definitely going to depend heavily on huge context windows and approaches like langchain. That said, I think today you still can't really get something that resembles a "self-driving" robot programmer, because it's just lacking the depth necessary.

- For language tasks, it's fantastic. When it comes to language tasks, it really doesn't feel like it's skipping anything, it feels like it has a genuine understanding of language. I'm not sure if we're there yet, but I am hopeful for the future of machine translation using LLMs: You can't talk to Google Translate about the context your string will be used in, but you CAN do that for GPT-4, and while I'm not really fluent in any language aside from English, cross-checking its work, I think it's at least a lot better than what you can currently do using DeepL and Google Translate for common languages. (To be fair, even LLaMA finetunes have been SHOCKINGLY useful for this use case in my experience. They hallucinate a bit too much to be useful, but the fact that the hallucinations usually seem to lie close to the answer leaves me impressed given that it's a several GiB bundle of data on my SSD that runs on CPU only.)

Also, GPT-4 is just going to be epic for people learning new things. I am enthusiastically excited about the ability to just auto-generate some example code for some kind of thing I'm not familiar with. Even if it's nearly useless, it can probably get you pointed in the right direction and learning what terms you need to google for.