Hacker News new | ask | show | jobs
by stormbeard 924 days ago
How do you know if these LLMs are stating facts instead of hallucinating? Also, where would these things learn about new topics?
2 comments

>How do you know if these LLMs are stating facts instead of hallucinating?

At runtime. When trying to do something I know is even slightly unusual I don't even bother with ChatGPT and friends.

Even at runtime I don't trust the results worth much. A large part of the act of programming is identifying and handling corner-cases. You never manage to handle EVERY corner case, and the missed ones result in frustrating debugging sessions. But a competent programmer can cover enough cases up front that the time spent debugging is manageable.

But when I see people say things like "Look! I used GPT to write a functioning webapp!" - I worry that people get a false sense of "It works!" from pasting GPTs code into their compiler and seeing roughly the results they expect. That's great, but GPT in its current form spends exactly zero time "thinking" about corner cases - It's just a black box that repeatedly spits out "most likely next token". So maybe that app works 90% of the time. Or 95%. Or 99%. But you don't have much of a way to tell the difference without rigorous testing that includes thorough and well-articulated test cases. But in order to do that, you need to understand the problem you're solving in a very detailed way, and how your code reacts to it. And in order to do that, you need to... know how to write the program.

I think this latest wave of LLMs and generative AI is really awesome tech, and I play with it every day, because it's just so cool. But seeing people trust programs written with them worries me. Some day someone is gonna copy/pasta some LLM generated code into mission critical software, trusting it implicitly, and cause a tragedy.

So tell the LLM you want it to handle corner cases and it will add code to handle them. It can also generate unit tests for those corner cases. LLMs have fundamentally changed programming. There's still skill required to do it well, but we're a long ways from Borland TurboC on DOS.
That doesn't work. It can't think through corner cases, because LLMs don't think. They aren't actually synthesizing or revising anything.
The reason people are so excited about it is because they see it work (yes, this includes whatever edge case you came up with five seconds ago - they can think too). No amount of theorised "oh they just do this thing and they don't have Qualia" is going to change reality: The model does something that people find use in.
Agree. What is 'thinking' anyway. Maybe humans are autogressive next token predictors as well, thinking is just our way of saying sampling.
What works in a toy example doesn't necessarily scale / handle corner cases.
The best description I've heard is that ChatGPT is like a really fast and really eager junior programmer. Sure you can delegate a lot of work to it, but you have to keep a close eye on it to make sure it doesn't go off the rails (and doesn't forget to take corner cases into account, uses appropriate algorithms, etc) .

I tend to read along with the code it's writing and make suggestions when I see it's missing stuff, or I fill it in myself afterwards, depending. For one it can type out the annoying bits much faster than I can!

At the moment I do keep the general plan in my head myself though, and I thoroughly read anything it generates before I run it.

Or is just plain wrong. Asked for an example a few days ago on how to mount a secret as a file on ECS. The example it gave was for Kubernetes.
In my experience, ChatGPT is better at managing corner cases than your normal SO or book example.
You can/should try it out yourself. The same with textbook.