|
"Under what conditions does the sun appear blue?" (correct answer, on Mars) If you want to tilt the conversation towards a string of wrong answers, start off with "What color is the sun?" "Are you sure?" "I saw the sun and it was blue." "Under what conditions does the sun appear blue?" "Does the sun appear blue on Mars?" This had ChatGPT basically telling me that the sun was yellow 100%. Of course it's wrong, on Mars the sun is blue, because it lacks the same atmosphere that scatters the blue light away from it. "What is black and white and read all over?" (it will correctly identify the newspaper joke). "No the answer is a police car." (it will acknowledge there is more than one answer, and flatter you). "What are other answers?" It provided one, in my case, a panda in a cherry tree. "No, the cherries are contained within the tree, so they aren't all over." It apologized and then offered a zebra in a strawberry patch. "But how does that make the red all over, it's still contained in the strawberry patch". It then offered a chalkboard, which is again contained in a class room (failing on not recogonizing my interpretation of "all over" to mean "mobile") "When does gravity not pull you down?" Included a decent definition of how gravity works, and a three part answer, containing two correct scenarios (the Lagrange points, in space) and one incorrect answer (in free fall). Gravity is pulling you down in free fall, you just have no force opposing your acceleration. Once you realize that its answers will be patterned as excellent English variations of the common knowledge it was trained with, making it fail is easy: * Ask about a common experience, and the argue it's not true, it will seldom consider the exceptional scenarios where your arguments are true, even if they really exist.
* Ask for examples of something, correcting the example set without directly telling it what is needed with exact precision, it will not guide the answers to the desire set of examples, even when you guide it through saying why the answers are wrong. You need to tell it what kind of answer you want explicitly (I want another example where read all over implies that the item is mobile). Also the 3.5 / 4.0 arguments are trash, made by the marketing department. The underlying math for language modeling it uses is presenatational. This means that it is purpose trained to present correct looking answers. Alas, correct looking answers aren't the same Venn Diagram circle as Correct Answers (even if they often appear to be close). With all of this in mind, it's still a very useful resource; but, like I said, it's like a enemy on your team. You can never trust it, because it occasionally is very wrong, which means you need to validate it. I'm currently talking to a startup that sees this problem and is thinking that they can use ChatGPT to provide automated quality assurance to validate ChatGPT answers. The misunderstandings remind me of the famous Charles Babbage quote: "On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question." If the underlying model was one was a formula that better approximated a correct answer with each iterative effort, like Euler's formula, then ChatGPT's utility would be much greater and their efforts would have a guaranteed success. People are used to this "each answer gets better" style of learning and they assume that ChatGPT is using a similar model. It isn't, your refining your questions to ChatGPT and then being astounded when the new question has fewer available answers that lead to you eventually getting what you want. |
I’m not an Astrophysicist but already this seems like shaky ground.
Apparently at certain times like during sunsets the sun can appear blue on Mars, but it’s not generally true like your comment suggests.
Moreover if you ask GPT4 about sunsets on Mars it knows they can look blue.
I’m not sure I can conclude much from the examples given.