|
|
|
|
|
by WhitneyLand
809 days ago
|
|
>Of course it's wrong, on Mars the sun is blue I’m not an Astrophysicist but already this seems like shaky ground. Apparently at certain times like during sunsets the sun can appear blue on Mars, but it’s not generally true like your comment suggests. Moreover if you ask GPT4 about sunsets on Mars it knows they can look blue. I’m not sure I can conclude much from the examples given. |
|
And I'm not astrophysicist either, I'm just playing with a stacked deck, because I have trained my new feed to give me quirky (if not mostly useless) neat bits of information. For example, if anyone writes about Voyager, I'm likely to hear about it in a few days.
"Apparently at certain times, like during sunsets, the sun can appear blue on Mars" - Yes, it can. And my question was "under what conditions can the sun appear blue?" It failed and continued to fail, even in the presence of guiding hints (But what about Mars?)
Perhaps not much can be concluded from the above test, except that ChatGPT can be coaxed into failure modes. We knew that already, the user interface clearly states it can give wrong answers.
What is fascinating to me is how people seem to convince themselves that a device that sometimes gives wrong answers is somehow going to fix it's underlying algorithm which permits wrong answers to somehow always be correct.
GPT-4 is an improvement, but the tools it uses to improve upon the answers are more like patches on top of the original algorithm. For example, as I believe you said, it generates a math program now to double-check math answers. The downsides of this is that it is still at risk of a small chance of generating the wrong program, and a smaller risk of that wrong program agreeing with its prior wrong answer. For a system that makes errors very infrequently, that's an effective way of reducing errors. But for right now, the common man isn't testing ChatGPT for quality, it's finding answers that seem to be good and celebrating. It's like mass confirmation bias. After the hype dies down a bit, we'll likely have a better understanding of what advances in this field we really have.