Extraordinary claims require extraordinary evidence. Given that GPT3 has 175 billion nodes, how would you even begin to support the claim that it never (or sufficiently rarely) does things that are surprising to humans?
But you can get a general feel by using ChatGPT. Open up a new conversation and ask it something like, "What is the capital of France?". Note the response. Open up a new conversation and note the response. Soon enough you should be able to see that the responses are far from random.
You can use the OpenAI APIs directly and have it run 10,000 or so iterations to see what kind of "hallucinations" it makes! They are not random!
Ask it details about a little-documented event and it'll happily tell you plausible, but utterly false, lies, however.
Apparently the "early 2011 Bougainville earthquake" was magnitude 6.3, at a depth of 21.7km, on the 20th January and caused "widespread damage to buildings and infrastructure in the region, and triggered landslides that blocked roads and hampered rescue efforts".
It was actually on the 7th Feb, a 6.4 and at a depth of 415km. There were "no immediate reports of damage or injuries".
None of this is remotely surprising, considering it's a turbocharged statistical model and it probably ingested a few words about it at most, out of billions and billions, but somewhere along the line from "famous" to "footnote" subjects, it will segue into complete fiction.
Some flavour of "git gc" after your reset is far more likely to crop up and ruin your day, that's true.
As long as you stay on the statistical beaten path (i.e. you're asking about Paris), you will probably be fine, indeed. Probably. Stochastic bugs are always the most fun anyway.
You definitely go about making “stochastic bugs” more reliable in a manner different from debugging software.
It’s more akin to industrial engineering. There is no such thing as a perfectly machined widget. So we come up with an acceptable range of tolerances and compute a process capability. Six sigma. 3.4 defects per million and then buy an insurance policy.
Thanks for the link! I don't think that really addresses my concern, though.
My point is that these LLMs are basically incredibly large programs that defy analysis with our current tools. Sure, I can poke it a few times and see that it usually does what I want, but that's not the same as saying it never goes off the rails.
If it does something crazy like post my bank login online, even only once in a billion times, that's still orders of magnitude higher than I'm willing to accept.
You’re basically asking me to prove to you that I can’t fly.
I will say it like this: it is highly improbable that I can fly. I cannot come up with a way to prove it to you. There is some sort of epistemic miscalculation going on if you operate under the assumption that I might be able to fly.
https://arxiv.org/abs/2202.03629
But you can get a general feel by using ChatGPT. Open up a new conversation and ask it something like, "What is the capital of France?". Note the response. Open up a new conversation and note the response. Soon enough you should be able to see that the responses are far from random.
You can use the OpenAI APIs directly and have it run 10,000 or so iterations to see what kind of "hallucinations" it makes! They are not random!