There is a huge backlash coming when the general public learns AI is plagued with errors and hallucinations. Companies are out there straight up selling snake oil to them right now.
Observing the realm of politics should be enough to disabuse anyone of the notion that people generally assign any value at all to truthfulness.
People will clamor for LLMs that tell them what they want to hear, and companies will happily oblige. The post-truth society is about to shift into overdrive.
It depends on situation. People want their health care provider to be correct. Same goes with chat bot when they are trying to get support.
On other hand at same time they might not want to me moralized to like told that they should save more money, spend less or go on diet...
AI providing incorrect information in many cases when dealing with regulations, law and so on can have significant real world impact. And such impact is unacceptable. For example you cannot have tax authority or government chatbot be wrong about some regulation or tax law.
But tax authorities are also quite often wrong about regulations and laws. That is why objection procedures exist. Legal system is built on such fail-safes. Even judges err on laws some times.
If you call the government tax hotline and ask a question not written under the prepared questions list, what would you expect would happen? The call center service personell is certainly not expert on tax laws. You would treat it suspiciously.
If LLMs can beat humans on the error rate, they would be of a great service.
LLMs are not fail-proof machines, they are intelligent models that can make mistakes just like us. One difference is that they do not get tired, they do not have an ego, they happily provide reasonings for all their work so that it can be checked by another intelligence (be it human or LLM).
Have we tried to establish a counsel of several LLMs to check answers for accuracy? That is what we do as humans in important decisions. I am confident that different models can spot hallucinations in one another.
Just to be really clear since I had to call the IRS tax hotline the other day... they are real experts over there.
And generally, people will tell me, "I'm not sure" or "I don't know". They won't just start wildly making things up but stating them in a way that sounds plausible.
“What is your error rate?”
This is the question where this sub genre of LLM ideas goes to die and be reborn as a “Co-pilot” solution.
1) Yes. MANY of these implementations are better than humans. Heck, they can be better at soft skills than humans.
2) How do you detect errors? What do you do when you give a user terrible information (Convincingly)
2.2) What do you do now, with your error rate, when your rate of creating errors has gone up since you no longer have to wait for a human to be free to handle a call?
You want the error rate, because you want to eventually figure out how much you have to spend on clean up.
But LLMs always advertise themselves as a "co-pilot" solution anyway. Everywhere you use LLMs they put a disclaimer that LLMs are prone to errors and you need to check the responses if you are using it foe something serious.
I agree that it would be better if the LLMs showed you stats on utilization and tokens and also an estimated error rate based on these.
This is shockingly accurate. Other than professional work, AI just has to learn how to respond to the individual's tastes and established beliefs to be successful. Most people want the comfort of believing they're correct, not being challenged in their core beliefs.
It seems like the most successful AI business will be one in which the model learns about you from your online habits and presence before presenting answers.
Exactly. This is super evident when you start asking for more complex questions in CS, and when asking for intermediate-level code examples.
Also the same for asking about apps/tools. Unless it is a super known app like Trello which has been documented and written about to death - the LLM will give you all kinds of features for a product, which it actually doesn’t have.
It doesn’t take long to realize that half the time all these LLMs just give you text for the sake of giving it.
Respectfully, I think we cracked basic intelligence. What do you imagine under basic intelligence?
LLMs can do homeworks, pass standardized exams, give advice WITHOUT ANY SPECIFIC TRAINING.
You can invent an imaginary game, explain the rules to the LLM and let it play it. Just like that.
You can invent an imaginary computer language, explain the syntax to the LLM and it will write you valid programs in that language. Just like that.
If that is not intelligent I do not know what is. In both cases, the request you put in is imaginary, exists only in your head, there are no previous examples or resources to train on.
> Respectfully, I think we cracked basic intelligence. What do you imagine under basic intelligence?
It all depends on your definition of intelligence. Mine is the ability to solve novel problems.
AI is unable to solve novel problems, only things it has been trained against. AI is not intelligent, unless you change the very definition of the word.
I challenge you to imagine an imaginary game or computer language, explain the rules to the LLM. It will learn and play the game (or write programs in your invented language), although you imagined it. There was no resource to train on. Nobody knows of that game or language. LLM learns on the spot with your instructions and plays the game.
I cannot understand grad school level mathematics even if you give me all the books and papers in the world. I was not formally trained in mathematics, does that make me not intelligent?
If LLM could invent consistent imaginary games (or anything, like a short novel, or a 3 page essay on anything it want), maybe i would agree with you. The issue is that anything it create is inconsistent. The issue might be an artificial limitation to avoid copyright issues, but still.
Consistency, for one. I have asked LLMs the exact same question twice in a row and got wildly different answers. Intelligence presupposes understanding. When I ask an LLM “give me the first X of Y” and it replies “I cannot give you the first X of Y because there have only been X+10, here’s the first X+5 instead”, I’m hard pressed to call it intelligent.
Have you tried specifying you field of inquiry which was algebra. Try saying solve this equation for me. I am a lawyer by day so I constantly face limitations of natural languages. The solution is to write less ambiguous prompts.
I disagree. They are not just text generators. LLMs are increasingly being multimodal they can hear and see.
We humans are also text generators based on text content. What we read and listen to influences what we write.
Llms are intelligent at least as us humans, they can listen, read, see, hear and communicate. With the latest additions they can also recall conversations.
They are not perfect. Main limitations are computing power available for each request and model size.
Have you tried Claude Opus 3 or GPT 3.5 or Gemini?
Microsofts copilot is dumb (I think they are resource constrained). I encourage everyone to try at least the 2-3 major LLMs before giving a judgement.
Asking LLMs for imaginary facts is the wrong thing here, not the hallucination of the LLMs.
LLMs have constraints, these are computation power and model size. Just like a human would get overwhelmed if you request too much with vague instructions LLMs also get overwhelmed.
We need to learn how to write efficient prompts to use LLMs. If you do not understand the matter, be able to provide enough context, the LLM hallucinates.
Currently criticising LLMs on hallucinations by asking factual questions is akin to saying I tried to divide by zero on my calculator and it doesn't work. LLMs were not designed for providing factual information without context, they are thinking machines excelling at higher level intellectual work.
akin to saying I tried to divide by zero on my calculator and it doesn't work
The big difference is that if I try to divide by zero on my calculator, it will tell me it doesn't work and perhaps even given me a useful error message. It won't confidently tell me the answer is 17.
> Currently criticising LLMs on hallucinations by asking factual questions is akin to saying I tried to divide by zero on my calculator and it doesn't work. LLMs were not designed for providing factual information without context, they are thinking machines excelling at higher level intellectual work.
I would agree with you, but they're currently billed as information retrieval machines. I think it's perfectly valid to object to their accuracy at a task they're bad at, but being sold as a replacement for.
This reminds me of movies shot in early times of the internet. We were warned that information on the internet could be inaccurate or falsified.
We found solutions to minimize wrong information for example we built and maintain Wikipedia.
LLMs will also come to a point where we can work with them comfortably. Maybe we will ask a council of various LLMs before taking an answer for granted, just like we would surf a couple of websites.
That's true, LLMs do not say I cannot understand I am overwhelmed at this stage. That is big drawback. You need to make sure that the AI understood it.
Some LLMs stop responding midway if the token limit is reached. That is another way of knowing that the LLM is overwhelmed. But most of the time they give lesser quality responses when overwhelmed.
Because it doesn't understand or have intelligence. It just knows correlations, which is unfortunately very good for fooling people. If there is anything else in there it's because it was explicitly programmed in like 1960's AI.
I disagree. AI in 1960s relied on expert systems where each fact and rule was handcoded by humans. As far as I know LLMs learn on their own on vast bodies of text. There is some level of supervision, but it is bot 1960s AI. That is the reason we get hallucinations as well.
Expert systems are more accurate as they rely on first order logic.
No. From my experience, many people think that AI is an infallible assistant, and even some are saying that we should replace any and all tools with LLMs, and be done with it.
The art part is actually pretty nice, because everyone can see directly if the generated art fits their taste, and back-and-forth with the bot to get what you want is actually pretty funny.
It gets frustrating sometimes, but overall it's decent as a creative activity, and because people don't expect art to be knowledge.
Yes, calling an LLM "AI" was the first HUGE mistake.
A statistical model the can guess the next word is in no way "intelligent" and Sam Altman himself agrees this is not a path to AGI (what we used to call just AI).
Please define the word intelligent in a way accepted by doctors, scientists, and other professionals before engaging in hyperbole or you're just as bad as the AGI is already here people. Intelligence is a gradient in problem solving and our software is creeping up that gradient in it's capabilities.
Intelligence is the ability to comprehend a state of affairs. The input and the output are secondary. What LLMs do is take the input and the output as primary and skip over the middle part, which is the important bit.
No, AI also needs to fail in similar ways as humans. A system that makes 0.001% errors, all totally random and uncorrelated, will be very different in production than a system that makes 0.001% errors systematically and consistently (random errors are generally preferable).
People will clamor for LLMs that tell them what they want to hear, and companies will happily oblige. The post-truth society is about to shift into overdrive.