Hacker News new | ask | show | jobs
by BugsJustFindMe 1122 days ago
I need to know what version of ChatGPT you were using, because this is a critical piece of information that everyone just blatantly ignores, and I can only imagine that it's out of ignorance of the significance of the difference.

This is what happened when I asked ChatGPT 4...

ME

Give me hints without outright telling me the answer to the riddle: "What is always hungry, needs to be fed, and makes your hands red?"

ChatGPT 4

Hint 1: This object can be found in many homes, usually in a specific room.

Hint 2: It is not a living creature, but it does "consume" things.

Hint 3: The part about hands turning red relates to a common physical response to a certain temperature.

Hint 4: It's a crucial tool for a common household task.

Hint 5: While it can't eat in the traditional sense, you do put things into it quite frequently.

ME

What's the answer?

ChatGPT 4

The answer to the riddle is a "fire" or "fireplace." It's always hungry for more fuel (like wood), needs to be fed to keep burning, and the heat from the fire can make your hands red.

6 comments

If I were still able to edit my original comment, I would add a note at the bottom that says to take the experience as a casual person downloading an AI app after hearing about it on the news.

Such as a lawyer who’s not particularly tech savvy.

The main point is it’s irresponsible to trust LLM output for any critical/important purpose because it’s not perfect. But too many first time users think it is perfect and trustworthy at face value, when it’s not.

I don’t actually know the version since I was interacting via an unofficial iOS app using some LLM under the hood. It may not have even been ChatGPT.

Then why did you specifically claim that you asked ChatGPT? These distinctions, while small, are important. Not really the type of mistake to make for a tech-savvy forum.
The iOS app that I had this interaction on was ChatOn.

https://apps.apple.com/us/app/chaton-ai-chat-bot-assistant/i...

The subtitle on the Apple App Store is "Powered by ChatGPT & GPT-4".

If you're to believe the app's advertising, it is powered by ChatGPT GPT-4.

I guess the point is GPT-4 hallucinates, too. Maybe it did well for this example but still a lawyer should not trust its output.
Maybe, but it's surprisingly good in the face of all the non-version-indicating complaints about how terrible people think it is. Mostly I doubt that the lawyer was using GPT4, because the lawyer sounds like the kind of person who would be ignorant of the significance of the difference.
The kind of person too lazy to check the output of a computer program before submitting it to a court of law is the type of person too cheap to pay $20 for the good version of the program.

Think: Lionel Hutz.

No, checking was done!

"Oops, I'd better remove that comma".

Here is what poe told me:

“Think about things that you might need to feed regularly, especially things that might leave stains on your hands. It might be something that you encounter frequently in your daily life.”

“ The answer to the riddle is "fire". Fire is always hungry for fuel, needs to be fed with wood or other combustible materials, and can make your hands red if you handle it or if you're cooking food over an open flame.”

And second time:

“This thing is a natural phenomenon that can occur in different situations and locations. It is often associated with warmth and light, and it can be both helpful and dangerous. People have been using it for various purposes for thousands of years.”

“The answer to the riddle is "fire". Fire is a natural phenomenon that needs fuel to keep burning, and it produces heat and light that can make your hands red. Fire has been used by humans for various purposes for thousands of years, such as cooking, heating, and providing light. However, fire can also be dangerous if not handled carefully.”

I am amazed at how a free bot is basically as good, if not better, as a openai’s bot. I am quite certain locally run llms will also be able to outcompete chatgpt. Probably why sam altman is desperate to block them.

When the answer is something ridiculous or stupid, it's 95%+ of the time GPT-3.5-turbo and is rarely disclosed by the other party. GPT-4 is an order of magnitude better, if not two orders of magnitude better.

It's hard to tell if the party crapping on ChatGPT is doing so out of ignorance or malice.

Finetuning with GPT-4 can't come soon enough...

Wow. That's kind of terrifying.
That's GPT-4, not ChatGPT (3.5-turbo I think).

Also, yes you can get correct information by tailoring your prompts, but that isn't the issue. The issue is that some prompts lead to bad results and confusing/incorrect answers. You changed what OP queried by providing the riddle and asking for hints to that riddle, whereas OP asked for a random riddle and then hints to that riddle.

> That's GPT-4, not ChatGPT

It absolutely is ChatGPT, the paid monthly "Plus" version, using the GPT4 model instead of the 3.5 model.