How would you get the correct number? I just did two Google searches and can't find the correct answer anywhere in the first page of results ("Novel The Haj chapters" and "Novel The Haj chapter list"). Even looking in the "look inside" preview on the Penguin Randomhouse website doesn't help because it apparently doesn't have a table of contents. I'm not surprised ChatGPT doesn't know and to me the only bad thing is that it's hallucinating an answer instead of admitting it doesn't know.
> And one perfectly reasonable way of interpreting that bit of raw text is that the answer to "How many chapters are in The Haj by Leon Uris?" is "11".
Only if you can write a sonnet that is also a haiku!
Absolutely - you don’t really need a chat agent to google things for you unless it’s way better at googling than you are. And right now it grabs the first couple of results for the
First search it thinks of and mindlessly summarizes them - I can do that myself thanks.
When I try this in GPT-4 I don't get a hallucination: "I'm sorry, but as an AI with a knowledge cut-off in September 2021, I can't provide specific information about the number of chapters in "The Haj" by Leon Uris. This book, like many novels, is not primarily structured by chapters and its sections may vary based on the edition of the book. You can easily find this information by checking the table of contents in your copy of the book." (I'm aware that every time you use it the answer is different.)
Technically its just a really good auto complete, whose factual database is a side-effect of stringing together contextually correct tokens. It by itself is entirely incapable of knowing when it is wrong, despite possibly generating sentences apologizing for being wrong when told it was wrong
I don't think it's obviously solvable. All current approaches are plainly incapable of introspection. These GPTs don't understand their own "minds" half as well as we understand them, and we don't understand them very well.
On the left side if you click on "Chapters Summary and Analysis" it gives a break down of the book into 5 parts with varying chapter counts:
Part 1 Chapters 1-20
Part 2 Chapters 1-16
Part 3 Chapters 1-10
Part 4 Chapters 1-17
Part 5 Chapters 1-14
Giving a total of 20+16+10+17+14 = 77 chapters
OTOH, I tried with Bing/Creative, telling it to use this link, and it still failed. Perhaps because you need to click on the "summary and analysis" section to expand it to show the info. It seems there is room for web retrieval-augmented LLMs like Bing to improve here and be a bit more agentic.
Interestingly Knuth's own answer to the question, has a typo, and refers to the book as having "four" chapters, while then continuing on to give the chapter counts as above for all five chapters! Something to confuse future GPTs when the training set includes this, perhaps!
I would rate a person who provides no sentence at all as performing significantly better, and I suspect most people could pretty quickly come up with something.
> I would rate a person who provides no sentence at all as performing significantly better
Why?
> I suspect most people could pretty quickly come up with something
It only takes 60 seconds to test that on yourself. It's not that easy to come up with something of similar length to ChatGPT's answer that also sounds somewhat natural/sensible.
Then it seems we don't disagree on anything concrete. You're just using a different rating system than me when I judge it as impressive compared to what an average person would produce in 60 seconds.
Not sure if this is a general principle of yours. If ChatGPT were able to write a 1000 word essay using all 5-letter words except for a single mistake, would you still find it unimpressive? Do you think it a tool or person who makes minor mistakes isn't useful? Or only when a tool/person makes major mistakes?
>I would rate a person who provides no sentence at all as performing significantly better
The logic failure in the above statement is probably worse than the logic failure of not being able to spontaneously compose a phrase with just 5-letter words - and slipping in one or two with a higher word-count.
>I suspect most people could pretty quickly come up with something
You'd be very surprised then. Most people fail at even more basic tasks.
Heck, most candidate programmers fail at fizz-buzz (not that more difficult than the above)
The idea that making a mistake but otherwise fulfilling most of the task is worse than failing to perform any part of it.
Especially in the context of "evaluating the performance of something".
Let's expand this a little to make it even more evident: if the task was "make a paragraph of 100 words using only 5 letter words" and an AI couldn't produce anything at all, whereas another came up with a paragraph of 100 words, except a couple of them had 6 or 4 letters, it would make absolutely no sense to rate the first as "better" than the second in performing the task.
As for understanding the task, the latter exhibits an understanding of it (since it produced a paragraph, and most of the words it used filled the criteria, which wouldn't happen if it chose them randomly), it just made a couple of mistakes (the kind of humans could easily make too in such a task). For the former we can't even be sure if it even understood the task at all.
We don't rate humans that way on performing tasks either (if they got it less than perfect it's worse than not doing it at all). Even math tests at the university level consider the approach and any partial results in the right direction, don't just mark it 0 if there's an error, nor give a higher mark to students who didn't produce anything.
I wasn't clear on how was using "better". Your example is better in that it fulfills the requirement, but I don't think it's as impressive as ChatGPT's answer. How long would it take to make a sentence that is at least 7 words (and also making sense, and ideally sounding good)?
This isn't something that can be usefully discussed. "Word" has a vague enough definition that a contraction can validly be considered one or two words. If you try and look to linguistics you'll just see they use specialized words with stricter definitions.
Regardless it's more reasonable for me to say "that's" is a five letter word than it is for the AI to say "spells" is a five letter word.
I tried, this is what I came up with under significant time pressure:
Happy books sound great.
It was very difficult to think of a plural verb with 5 letters, and once I realized that was an issue, I was worried that I wouldn't have enough time to come up with a singular noun that would fit any of the singular verbs that I was considering (reads, seems).
Interestingly, this is the exact same mistake that ChatGPT made! It has "spell" -> "spells" which is a plurality / correctness of sentence mistake.
My sentence is technically correct and could be used plausibly in conversation: "What kind of books do you want to read?" "Happy books sound great."
But it's a pretty weak sentence. Being restricted from articles makes it very difficult to get agreement.
It did get closer. For that type of query you can ask it check its work and can usually triangulate on correct answer within a single prompt, eventually.
I would be cautious of a Clever Hans effect there. If you repeat the question until you get the right answer you're providing the AI with significant extra information.
No, in a single prompt, you can instruct it to check its work and keep going until it’s right (or at least have it tell you which of the N answers were right or wrong). Essentially chain of thought reasoning.