Hacker News new | ask | show | jobs
by cfeduke 714 days ago
Okay I guess I've just had a different experience entirely. Maybe I'm jaded by hallucinations.

The code ChatGPT generates is often bad in ways that are hard to detect. If you are not an experienced software engineer, the defects could be impossible to detect, until you/ChatGPT has gone and exposed all your customers to bad actors, or crash at runtime, or do something terribly incorrect.

As far as other thought work goes, I am not consulting ChatGPT over, say, a dietician or a doctor. The hallucination risk is too high. Producing an answer is the not the same as producing a correct answer.

10 comments

I agree. I've just seen it hallucinate too many things that on the surface seem very plausible but are complete fabrications. Basically my trust is near 0 for anything chatgpt, etc. spits out.

My latest challenge is dealing with people that trust chatgp to be infallible, and just quote the garbage to make themselves look like they know what they are talking about.

> things that on the surface seem very plausible but are complete fabrications

LLMs are language model, it's crazy people expect them to be correct in anything beyond surface level language.

Yeah, I was probably being a bit too harsh in my original comment. I do find them useful, you just have to be wary of the output.
My experience actually agrees with you. It's just that the set of use cases that either:

- Are hard (or boring) to do, but easy to evaluate - for me, e.g. writing code, OCR, ideation; or

- Don't require a perfectly correct answer, but more of a starting point or map of the problem space; or

- Are very subjective, or creative, with there being no single correct answer,

is surprisingly large. It covers pretty much everything, but not everything for everyone at the same time.

> Okay I guess I've just had a different experience entirely.

I've seen both the good and the bad. I really like the good parts. Most recently, Claude Sonnet 3.5 fixed a math error in my code (I prompted it to check for it from a well-written bug report, and it did it fix it ever so perfectly).

These days, it is pretty much second nature for me to pull up a new file & prompt Copilot to complete writing the entire code from my comment trails. I don't think I've seen as much change in my coding behaviour since Borland Turbo C -> NetBeans.

If your procees is asking it to "write me all this code", then you slap it in production, you're going to have a bad time. But there's intermediate ground.

>I am not consulting ChatGPT over, say, a dietician or doctor

Do you know any doctors, by chance? You have way more faith in experts than I do.

ChatGPT is just statistically associating what it’s observed online. I wouldn’t take dietary advice from the mean output of Reddit with more trust than an expert.
Doctors can be associating what they’ve learned, often with heavy biases from hypochondriacs and not enough time per patient to really consider the options.

I’ve had multiple friends get seriously ill before a doctor took their symptoms seriously, and this is a country with decent healthcare by all accounts.

Human biases are bad too.

> Doctors can be associating what they’ve learned, often with heavy biases from hypochondriacs

So true. And it's hard to question a doctor's advice, because of their aura of authority, whereas it's easy to do further validation of an LLMs diagnosis.

I had to change doctor recently when moving towns. It was only when chancing on a good doctor that I realised how bad my old doctor was - a nice guy but cruising to retirement. And my experience with cardiologists has been the same.

Happy to get medical advice from an LLM though I'd certainly want prescriptions and action plans vetted by a human.

    > It was only when chancing on a good doctor that I realised how bad my old doctor was
How did you determine the new doctor is "good"?
By the time a doctor paid me enough attention to realise something was wrong I had suffered a spinal cord injury whose damage can never be reversed. I’m not falling all over myself to trust chatgpt, but I got practically zero for doctors either. Nobody moved until I threatened to start sueing.
I sometimes use ChatGPT to prepare for a doctor's visit so I can have a more intelligent conversation even if I may have more trust overall in my doctor than in AI.
Will be cool once we have active agents tho. Surely the learning/research process isn't that difficult even for current LLMs/similar architectures. If it can teach itself, or it can collate new (never seen) data for other models then that's the cool part.
You realize that "online" doesn't just mean Reddit, but also Wikipedia and arXiv and PubMed and other sources perused by actual experts? ChatGPT read more academic publications in any field than any human.
Yes, but because ChatGPT doesn’t think, it doesn’t know which arxiv papers are absolute garbage and which ones are legit.

Wikipedia does not have dietary advice. It’s an encyclopedia.

I’ve seen so many doctors advertising or recommending homeopathic “medicines” or GE-132 [1], that I would be fairly more confident in an LLM + my own verification from reliable sources. I’m no doctor, but I know more than enough to recognize bullshit, so I wouldn’t just recommend this approach to everyone.

[1] https://pubmed.ncbi.nlm.nih.gov/1726409/

I recently needed to help a downstream team with a problem with an Android app. I never did mobile app dev before, but I was able to spin up a POC (having not coded in Java for 22 years) and solve the problem with the help of ChatGPT 4.0.

Sure I probably would have been able to do it without ChatGPT, but it was so much easier to have something to bounce ideas off-of. A safety net, if you will.

The hallucination risk was irrelevant: it did hallucinate a little early on. I told it it was a hallucinating, and we moved onto a different way of solving the problem. It was easy enough to verify it was working as expected.

Seems to me this is the equivalent of fast retrieval and piecing together from a huge amount of examples in the data. This might take far more time if you were to do this yourself. That's a plus for the tools. In other words, a massively expensive (for the service provider) auto-complete.

But try to do something much more simple but has much fewer examples (a typical case is something which has bad documentation) in the data, and it falls apart. I even tried to use Perplexity to create a dead simple CLI command, and it hallucinated an answer (looking at the docs, it misused the parameter, and may have picked up on someone who gave an incorrect answer in the data.)

It's already gotten significantly better and faster in a few yrs. Maybe LLMs will hit a wall in the next 5yrs but even if it does it's still extremely useful and there are always other ways to optimize the current technology where this is already a major development for society.
>The code ChatGPT generates is often bad in ways that are hard to detect. If you are not an experienced software engineer, the defects could be impossible to detect, until you/ChatGPT has gone and exposed all your customers to bad actors, or crash at runtime, or do something terribly incorrect.

I wonder about this a lot, because there's a future here where a decent amount of software engineering is offloaded to these AIs and we reach a point, in the near future, where no one really knows or understands what's going on. That seems bad. Put another way, suppose that your primary care doctor is really just using MedAI to diagnose and recommend treatment for whatever it is you went in to see him about. Over time, these sorts of shortcuts metastasize and the doctor ends up not really knowing anything about you, or the other patients, or what he's really doing as a doctor ... it's just MedAI (with whatever wrongness rate is tolerable for the insurance adjusters). Again, seems bad. There's a palpable loss of human knowledge here that's enabled by a "tool" that's allegedly going to make us all better off.

The closest analogy here is that we don't have as full-featured autopilots in airplanes as we could, because they reduce safety.
Right, good point. Maybe I'm making an argument that some features, or scope of features, should be highly regulated along the same lines.
>The code ChatGPT generates is often bad in ways that are hard to detect. If you are not an experienced software engineer, the defects could be impossible to detect

I keep hearing this, but it's incorrect. While I only know R, which is obviously a simple language, I would never type out all my code and go without testing to ensure it does what I intended before using it regularly.

So I can't imagine someone that knows a more complex language just typing out all of it before integrating it into business systems at their work or anything else before testing it.

Why would AI be any different?

Why the hell are AI skeptics acting like getting help from an LLM would involve not testing anything? Of course I test it! Why on earth wouldn't I? Just as I tested code made by freelancers I hired on commission before using the code I bought from them. Do AI skeptics really not test their own code? Are you all insane?

> While I only know R, which is obviously a simple language

Take it from someone who started with R, R is 100% not a simple language. If you can write good R, you're probably a surprisingly good potential SE as R is kinda insane and inconsistent due to 50+ years of history (from S, to R etc).

Hmmm.. I'm trying to imagine interviewing for SE and telling them I got wealthy from a crypto market-making algorithm I coded in R during Covid and the interviewer responding with anything but laughter or with silence as they ponder legal ways to question my mental health.

It's an excellent language, I think, for many reasons. One is that you can work with data within hours because even before learning what packages or classes are, you got native objects for data storage, wrangling, and analysis. Even import my Excel data and rapidly learn the native function cheat sheet so fast that I was excited to learn what packages are because I couldn't wait to see what I could do.

That was my experience in like 2010, maybe, and after having C++ and Python go in and out my head during college multiple times. I view R as simple only because I actually felt more helpless to keep learning it than helpless to ever learn coding at all. Worth noting that I was a Stat/Probability tutor with a Finance degree and much Excel experience.

> That was my experience in like 2010, maybe, and after having C++ and Python go in and out my head during college multiple times. I view R as simple only because I actually felt more helpless to keep learning it than helpless to ever learn coding at all. Worth noting that I was a Stat/Probability tutor with a Finance degree and much Excel experience.

Ah yeah, makes sense. That's the happy path for learning R (know enough stats etc to decode the help pages).

That being said, R is an interesting language with lots of similarities to both C based languages and also Lisp (R was originally a scheme intepreter), so it's surprisingly good at lots of things (except string manipulation, it's terrible at that).

Easy answer. Ask ChatGPT to write testable code, and tests for the code, then just verify the tests. If the tests don't work, have ChatGPT use the test output to rewrite the code until it does.

If you can't have ChatGPT write testable code because of your architecture, you have other problems. People with bad process and bad architecture saying AI is bad because it doesn't work well with their dumpster fire systems, 100% facepalm.

> If you can't have ChatGPT write testable code because of your architecture, you have other problems.

There exist lots of reasons why code is hard to test automatically that have nothing to do with the architecture of the code, but with the domain for which the code is written and runs.

> The code ChatGPT generates is often bad in ways that are hard to detect.

Does it work though, yes it does. There are many human coders who write bad code and life goes.