Hacker News new | ask | show | jobs
by crizzlenizzle 1236 days ago
Is it correct though?

I’ve been toying around with ChatGPT for a few weeks now and I encountered a few situations in which ChatGPT was like 90% accurate at best. Things like suggesting snippets of configuration files or plugin research. It’s good to get an idea and get started somewhere, but I certainly cannot trust it blindly.

10 comments

What I've been telling everyone is that you can not (should not) ask ChatGPT a question that you can not independently verify that answer to yourself.

This is kind of what makes it good for generating code, because everything it generates can be pretty quickly verified and validated by another machine (interpreter/compiler).

Makes it not so great for writing essays on books you didn't read, and especially for doing math you don't understand... because it can't do math AT ALL.

I was kind of thinking about this.

Let's hypothetically assume we have some sort of AGI and we can ask it to write programs and text and nothing else.

Is there anyone on this planet who would think that they don't need to look at the generated code? I mean imagine a manager simply feeding in tickets and getting a finished application out without ever knowing how it was produced.

The application is business critical and any kind of mistake could ruin his business which puts the manager at complete mercy of the AI.

Now you might say that this happens with humans as well but when humans cause problems we let other humans review and test their code.

AI causes problems? Let's add more humans. Wait a minute...

> everything it generates can be pretty quickly verified and validated by another machine

It can be verified in a sense that it builds, but that doesn't mean that it actually does what you asked it to do, or that it does it on all valid inputs. The worst bugs to track down are silent logic bugs.

For math, I'm kind of surprised that it can't recognize "this is math" and then handle that with normal calculations instead of the language model. I assume we'll see that before long.
I really want Wolfram|Alpha to be integrated into this... that'd be nice. Also if they could make W|A any faster than a glacier while there at it that'd be great.
A good trick is to ask it to translate the request into commands of your choosing. Like ask it to generate python code to make the calculation for example. Another thing that works well is to turn it into a command extraction problem, give it examples of the kinds of commands you want, and build an interpreter for those commands.

I agree, we’re not far from that, or we’re there now.

I'm leaning towards using it for things I already know exactly how to do -- including a very clear idea of the result. In these contexts, it can save some mental workload / time.
Don't think so. There's clearly the beginning of a while loop near the top of the obfuscated version. There's no loops at all in the 'de-obfuscated' version.
Here's GPT's own explanation what the purpose of that while loop is:

---

This code uses JavaScript's `eval` function to obfuscate the code by looping over an array of strings and passing them as arguments to `eval` to create a variable. It also uses an anonymous function to obfuscate the code. The code is deobfuscated by replacing the `eval` function and the anonymous function with their respective strings.

That explanation is also not correct! The while loop is an obfuscation gadget (of sorts), but it doesn't use eval, it uses push and shift to rotate the array. The only use of eval is 'eval("find")', which is not top grade obfuscation.
That's not generally a good enough indicator; plenty of obfuscation involves loops that otherwise aren't hit.
Yeah, pretty sure that loop is to modify the lookup table for strings in runtime so you can't just statically replace it in the code, it's not strong obfuscation but in a properly deobfuscated code that loop wouldn't exist
The while loop does not come from the original code. Probably part of the obfuscation https://twitter.com/AlexAlexandrius/status/16178998824000839...
Yeah an example I was shown was python code to process some data. It was 30 lines of correct-looking trivial boilerplate code, except for one regex to do the actual processing. The regex was hopelessly wrong.

Clearly if you didn't know how to write the other 29 lines of code there's no way you are going to be able to debug the regex.

The optimistic way to look at it though is that it wrote the boring 29 lines that you didn't want to write and got you straight to the actual problem that needs solving.
We recently had an ad-hoc experiment like this as well "give us basic config management code to download a service, add a systemd service for it, deploy a config and setup reloading of the service"

And it had some funny mistakes in there - something called "Reload service XYZ" and it was actually a hard restart of the service, rather silly file locations and such, sure.

But at the same time, it saved us an hour or two of boilerplate setup and even dug up a somewhat smart way to validate the configuration for this very specific service. This allowed us to jump more into understanding the service, tuning the config and setting up good tests for the setup instead of the same boring 20 resources in a config management.

I guess I could also ask if we could have some better form of service or config management which eliminates this boilerplate... but ChatGPT made our current day-to-day work a little easier there.

Yes, and honestly I think this is the actual potential win here, especially in boilerplate-heavy languages (Java, I'm looking at you in particular). So if this turns out to be the case it could be good for programmer productivity while skewing the dev landscape towards tools, frameworks, languages etc that the prevailing AI models work well with.
90% accurate sounds impressive, and it is, but its still 100% incorrect almost always.
But does it follow the 80/20 rule ?

In this case, 80% of the answer for 20% of the effort ?

It has this really amazing and terrifying quality of being a really good bullshitter. I asked it an AWS question once and it gave me 4 very convincing sounding answers. I went to try it. 2 of them are complete bullshit as in as the commands don't even exist. The only good answer is the one I already had. It's in this uncanny valley of bullshitting. Can be quite dangerous in some situations, especially if one is lolled into trusting it.
I recognize what you are describing and I actually think that its predisposition to doing this has become worse in the past week or so.
In my experience, ChatGPT often comes up with pseudo syntax.
It often happens that ChatGPT will confidently give you something that _looks_ like what you're asking for despite it being awfully wrong - sometimes you can make it "understand" its mistake and correct it, sometimes not. It's usually not that far off, but trusting it blindly is just out of the question.
I was having ChatGPT give me wildly wrong answers and when I asked for a source it provided me fake websites and confidently quoted information from those sites that have never existed
It's good enough. I know zero powershell, but I know other languages enough to understand the common grammar. With ChatGPT I'm a fairly rapid powershell programmer right off the bat - as evidenced by the script I've been writing this afternoon. I don't know any of the (overcomplicated) syntax, but now I don't have to.
This is the original code https://twitter.com/AlexAlexandrius/status/16178998824000839...

It's very similar to the deobfuscated version, but ChatGPT wrote the code in the first place

Is it ready for production? Maybe not. Is it amazing and inevitably going to get better? Yes. Does it make a lot of human labor redunant in the very foreseeable future? Also yes.
Define accurate?

It's just like any other AI system, it returns results as a best effort proposition of accurate with a % confidence that doesn't map well to binary outcomes.

So yes, it can be accurate. But there are scenarios where it must be strict or binary correct, and its not great at that bit.

I’ve had it confidently tell me to use Python libraries that don’t exist, pass parameters to methods that aren’t in the method signature, and to write code that had to be debugged and fixed.

I’m still excited to use it, but you have to know enough about coding to ensure correctness. It’s no where near possible for a non-coder to build a complicated app with (so far).