Hacker News new | ask | show | jobs
by ArenaSource 1176 days ago
If you put GPT-4 on a loop with access to the shell it manages to do whatever is needed to finish the job

https://raw.githubusercontent.com/jla/gpt-shell/assets/examp...

2 comments

My experience with GPT-4 has been really disappointing. It didn't feel like a step up from 3.5.

As an example, I've been trying to use it to learn Zig since the official docs are ... spartan. And I've said, "here's my code, here's the error, what's wrong with it?" and it will go completely off the rails suggesting fixes that don't do anything (or are themselves wrong).

In my case, understanding/fixing the code would have required GPT-4 to know the difference between allocating on the stack/heap and the lifetimes of pointers. It never even approached the right solution.

I haven't yet gotten it to help me in even a single instance. Every suggestion is wrong or won't compile, and it can't reason through the errors iteratively to find a fix. I'm sure this has to do with a small sample of Zig code in its training set, but I reckon an expert C coder could have spotted the bug instantly.

If you are using GPT-4 to try to deal with the fact that technical documentation on the public internet is sparse for your topic of interest, you are likely to be disappointed, since GPT-4’s training set likely has the same problem, so you are, in effect, hoping it will fill in gaps in missing data, prompting hallucinations.

It’ll be much better on subjects where there is too much information on the public internet for a person to efficiently manage and sift through.

I think you're right. My hope was that it could reason through the problem using knowledge from related sources like C and an understanding below the syntax of what was actually happening.

But it most certainly did not.

Depending on what you're doing, you might find few-shot techniques useful.

I used GPT 3.0 to maintain a code library in 4 languages, I'd write Dart (basically JS, so GPT knows it well), then give it a C++ equivalent of a function I had previously translated, and it could do any C++ from there.

1. GPT4 is learning from the same spartan docs as you, likely

2. GPT4's training data likely doesn't include significant Zig use, since large parts of its training data cut off a few years ago. I use Rust and it doesn't know about any recently added Rust features, either.

This has interesting implications because it means people will gravitate towards languages/frameworks/libraries that GPT knows well, which means even less training data will be generated for the new stuff. This is a form of value lock-in.

> This has interesting implications because it means people will gravitate towards languages/frameworks/libraries that GPT knows well, which means even less training data will be generated for the new stuff. This is a form of value lock-in.

That's the kind of problem that most people are just failing to see. The usage of this models might not in itself be problematic, but the changes that it bring are often unexpected and too deep for us to see clearly now. And yet, people are rushing towards them at full speed.

It's inevitable, really. But that's like saying Washing Machine changed fashion. It might have, but the changes aren't all that abominable, either.
GPT-4 is just regurgitating what its "learned" from previously scraped content on the Internet. If somebody didn't answer it on StackOverflow before 2021, it doesn't know it. It can't reason able anything, it doesn't "understand" stacks or pointers.

That said its really good at regurgitating stuff from StackOverflow. But once you step beyond anything that someone has previously done and posted to the Internet, it quickly gets out of its depth.

It's a step up by an order of magnitude for certain things. Like chess. It is really good at chess actually. But not programming. Seems maybe marginally better on average. Worse in some ways.
It can't learn zig without plenty of samples
Yeah I can't wait to get API access to gpt-4, it is a stepwise more capable based on the stuff I've done with chatgpt on gpt-4.

That said, even gpt-3.5 will try multiple routes to get to the same endpoint. It seems to get distracted pretty easily though.

One demo of gpt-4’s superiority over gpt-3 is to come up with a prompt that determines the language of some given text.

I couldn’t figure out a gpt-3 prompt that could handle “This text is written in French” correctly (it thinks it’s written in French), but with gpt-4 you can include in the prompt to disregard what the text says and focus on the words and grammar that it uses.

> It seems to get distracted pretty easily though.

That’s true, gpt-4 is way more easy to guide with the system messages and it doesn’t forget the instructions as the conversation goes on.