| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by crizzlenizzle 1282 days ago
	Is it correct though? I’ve been toying around with ChatGPT for a few weeks now and I encountered a few situations in which ChatGPT was like 90% accurate at best. Things like suggesting snippets of configuration files or plugin research. It’s good to get an idea and get started somewhere, but I certainly cannot trust it blindly.

10 comments

gooseus 1282 days ago

What I've been telling everyone is that you can not (should not) ask ChatGPT a question that you can not independently verify that answer to yourself.

This is kind of what makes it good for generating code, because everything it generates can be pretty quickly verified and validated by another machine (interpreter/compiler).

Makes it not so great for writing essays on books you didn't read, and especially for doing math you don't understand... because it can't do math AT ALL.

imtringued 1282 days ago

I was kind of thinking about this.

Let's hypothetically assume we have some sort of AGI and we can ask it to write programs and text and nothing else.

Is there anyone on this planet who would think that they don't need to look at the generated code? I mean imagine a manager simply feeding in tickets and getting a finished application out without ever knowing how it was produced.

The application is business critical and any kind of mistake could ruin his business which puts the manager at complete mercy of the AI.

Now you might say that this happens with humans as well but when humans cause problems we let other humans review and test their code.

AI causes problems? Let's add more humans. Wait a minute...

int_19h 1282 days ago

> everything it generates can be pretty quickly verified and validated by another machine

It can be verified in a sense that it builds, but that doesn't mean that it actually does what you asked it to do, or that it does it on all valid inputs. The worst bugs to track down are silent logic bugs.

danenania 1282 days ago

For math, I'm kind of surprised that it can't recognize "this is math" and then handle that with normal calculations instead of the language model. I assume we'll see that before long.

guerrilla 1282 days ago

I really want Wolfram|Alpha to be integrated into this... that'd be nice. Also if they could make W|A any faster than a glacier while there at it that'd be great.

unoti 1282 days ago

A good trick is to ask it to translate the request into commands of your choosing. Like ask it to generate python code to make the calculation for example. Another thing that works well is to turn it into a command extraction problem, give it examples of the kinds of commands you want, and build an interpreter for those commands.

I agree, we’re not far from that, or we’re there now.

boplicity 1282 days ago

I'm leaning towards using it for things I already know exactly how to do -- including a very clear idea of the result. In these contexts, it can save some mental workload / time.

Jeema101 1282 days ago

Don't think so. There's clearly the beginning of a while loop near the top of the obfuscated version. There's no loops at all in the 'de-obfuscated' version.

cocomutator 1282 days ago

Here's GPT's own explanation what the purpose of that while loop is:

---

This code uses JavaScript's `eval` function to obfuscate the code by looping over an array of strings and passing them as arguments to `eval` to create a variable. It also uses an anonymous function to obfuscate the code. The code is deobfuscated by replacing the `eval` function and the anonymous function with their respective strings.

twic 1282 days ago

That explanation is also not correct! The while loop is an obfuscation gadget (of sorts), but it doesn't use eval, it uses push and shift to rotate the array. The only use of eval is 'eval("find")', which is not top grade obfuscation.

bobkazamakis 1282 days ago

That's not generally a good enough indicator; plenty of obfuscation involves loops that otherwise aren't hit.

enjoytheview 1282 days ago

Yeah, pretty sure that loop is to modify the lookup table for strings in runtime so you can't just statically replace it in the code, it's not strong obfuscation but in a properly deobfuscated code that loop wouldn't exist

nabakin 1282 days ago

The while loop does not come from the original code. Probably part of the obfuscation https://twitter.com/AlexAlexandrius/status/16178998824000839...

seanhunter 1282 days ago

Yeah an example I was shown was python code to process some data. It was 30 lines of correct-looking trivial boilerplate code, except for one regex to do the actual processing. The regex was hopelessly wrong.

Clearly if you didn't know how to write the other 29 lines of code there's no way you are going to be able to debug the regex.

vlunkr 1282 days ago

The optimistic way to look at it though is that it wrote the boring 29 lines that you didn't want to write and got you straight to the actual problem that needs solving.

tetha 1282 days ago

We recently had an ad-hoc experiment like this as well "give us basic config management code to download a service, add a systemd service for it, deploy a config and setup reloading of the service"

And it had some funny mistakes in there - something called "Reload service XYZ" and it was actually a hard restart of the service, rather silly file locations and such, sure.

But at the same time, it saved us an hour or two of boilerplate setup and even dug up a somewhat smart way to validate the configuration for this very specific service. This allowed us to jump more into understanding the service, tuning the config and setting up good tests for the setup instead of the same boring 20 resources in a config management.

I guess I could also ask if we could have some better form of service or config management which eliminates this boilerplate... but ChatGPT made our current day-to-day work a little easier there.

seanhunter 1282 days ago

Yes, and honestly I think this is the actual potential win here, especially in boilerplate-heavy languages (Java, I'm looking at you in particular). So if this turns out to be the case it could be good for programmer productivity while skewing the dev landscape towards tools, frameworks, languages etc that the prevailing AI models work well with.

Lutger 1282 days ago

90% accurate sounds impressive, and it is, but its still 100% incorrect almost always.

euroderf 1282 days ago

But does it follow the 80/20 rule ?

In this case, 80% of the answer for 20% of the effort ?

hangonhn 1282 days ago

It has this really amazing and terrifying quality of being a really good bullshitter. I asked it an AWS question once and it gave me 4 very convincing sounding answers. I went to try it. 2 of them are complete bullshit as in as the commands don't even exist. The only good answer is the one I already had. It's in this uncanny valley of bullshitting. Can be quite dangerous in some situations, especially if one is lolled into trusting it.

gjvc 1282 days ago

I recognize what you are describing and I actually think that its predisposition to doing this has become worse in the past week or so.

zahrc 1282 days ago

In my experience, ChatGPT often comes up with pseudo syntax.

folkrav 1282 days ago

It often happens that ChatGPT will confidently give you something that _looks_ like what you're asking for despite it being awfully wrong - sometimes you can make it "understand" its mistake and correct it, sometimes not. It's usually not that far off, but trusting it blindly is just out of the question.

whamlastxmas 1282 days ago

I was having ChatGPT give me wildly wrong answers and when I asked for a source it provided me fake websites and confidently quoted information from those sites that have never existed

generalizations 1282 days ago

It's good enough. I know zero powershell, but I know other languages enough to understand the common grammar. With ChatGPT I'm a fairly rapid powershell programmer right off the bat - as evidenced by the script I've been writing this afternoon. I don't know any of the (overcomplicated) syntax, but now I don't have to.

nabakin 1282 days ago

This is the original code https://twitter.com/AlexAlexandrius/status/16178998824000839...

It's very similar to the deobfuscated version, but ChatGPT wrote the code in the first place

anigbrowl 1282 days ago

Is it ready for production? Maybe not. Is it amazing and inevitably going to get better? Yes. Does it make a lot of human labor redunant in the very foreseeable future? Also yes.

weego 1282 days ago

Define accurate?

It's just like any other AI system, it returns results as a best effort proposition of accurate with a % confidence that doesn't map well to binary outcomes.

So yes, it can be accurate. But there are scenarios where it must be strict or binary correct, and its not great at that bit.

influx 1282 days ago

I’ve had it confidently tell me to use Python libraries that don’t exist, pass parameters to methods that aren’t in the method signature, and to write code that had to be debugged and fixed.

I’m still excited to use it, but you have to know enough about coding to ensure correctness. It’s no where near possible for a non-coder to build a complicated app with (so far).