Hacker News new | ask | show | jobs
by ec109685 1120 days ago
Interesting both completely whiff on the number of chapters in the Haj.
2 comments

How would you get the correct number? I just did two Google searches and can't find the correct answer anywhere in the first page of results ("Novel The Haj chapters" and "Novel The Haj chapter list"). Even looking in the "look inside" preview on the Penguin Randomhouse website doesn't help because it apparently doesn't have a table of contents. I'm not surprised ChatGPT doesn't know and to me the only bad thing is that it's hallucinating an answer instead of admitting it doesn't know.
So this is great. Asking Bing 'how many chapters are in The Haj by Leon Uris?' produces the answer:

   According to my sources, there are 11 chapters in “The Haj” by Leon Uris[1]
   
   [1] https://cs.stanford.edu/~knuth/chatGPT20.txt
Which is amazing, because of course that document actually includes TWO different explanations of how many chapters are in The Haj - chatGPT's:

   The novel consists of 51 chapters and an epilogue, and it is divided into three parts.
And Knuth's:

   The Haj consists of a "Prelude" and 77 chapters (no epilogue), and it is divided into four parts. 
Faced with these two ambiguous answers, Bing chooses neither, and instead decides to go with 11. Why?

Because right at the top of that document, Knuth has published on the internet:

   10. How many chapters are in The Haj by Leon Uris?
   11. Write a sonnet that is also a haiku.
And one perfectly reasonable way of interpreting that bit of raw text is that the answer to "How many chapters are in The Haj by Leon Uris?" is "11".
> And one perfectly reasonable way of interpreting that bit of raw text is that the answer to "How many chapters are in The Haj by Leon Uris?" is "11".

Only if you can write a sonnet that is also a haiku!

The plug-ins are generally much, much worse than ChatGPT itself I have found. You are just hoping it stumbled on right answer.
Absolutely - you don’t really need a chat agent to google things for you unless it’s way better at googling than you are. And right now it grabs the first couple of results for the First search it thinks of and mindlessly summarizes them - I can do that myself thanks.
> the only bad thing is that it's hallucinating an answer instead of admitting it doesn't know.

Isn't this a fundamental issue?

When I try this in GPT-4 I don't get a hallucination: "I'm sorry, but as an AI with a knowledge cut-off in September 2021, I can't provide specific information about the number of chapters in "The Haj" by Leon Uris. This book, like many novels, is not primarily structured by chapters and its sections may vary based on the edition of the book. You can easily find this information by checking the table of contents in your copy of the book." (I'm aware that every time you use it the answer is different.)
Only if it can't be corrected. How do you rate the likelihood of this problem being unsolvable?
Well it's a language model.

Technically its just a really good auto complete, whose factual database is a side-effect of stringing together contextually correct tokens. It by itself is entirely incapable of knowing when it is wrong, despite possibly generating sentences apologizing for being wrong when told it was wrong

I don't think it's obviously solvable. All current approaches are plainly incapable of introspection. These GPTs don't understand their own "minds" half as well as we understand them, and we don't understand them very well.
Since it's made by people who are convinced they're always right when explaining things?

Fairly high.

Sorry, no idea.
Ask ChatGPT.
You can get the chapter counts from here:

http://www.bookrags.com/studyguide-the-haj/chapanal001.html

On the left side if you click on "Chapters Summary and Analysis" it gives a break down of the book into 5 parts with varying chapter counts:

Part 1 Chapters 1-20 Part 2 Chapters 1-16 Part 3 Chapters 1-10 Part 4 Chapters 1-17 Part 5 Chapters 1-14

Giving a total of 20+16+10+17+14 = 77 chapters

OTOH, I tried with Bing/Creative, telling it to use this link, and it still failed. Perhaps because you need to click on the "summary and analysis" section to expand it to show the info. It seems there is room for web retrieval-augmented LLMs like Bing to improve here and be a bit more agentic.

Interestingly Knuth's own answer to the question, has a typo, and refers to the book as having "four" chapters, while then continuing on to give the chapter counts as above for all five chapters! Something to confuse future GPTs when the training set includes this, perhaps!

https://cs.stanford.edu/~knuth/chatGPT20.txt

I did the same search on DuckDuckGo and the first link I got refers to 77 chapters.
> How would you get the correct number?

You could simply check the book. It’s a shame there is not more literary data in ChatGPT training corpus.

It also fails to write a sentence with only five character words.
Still fairly impressive. Probably better than most people could do if given 60 seconds, but probably worse than most people if given 10 minutes.
I would rate a person who provides no sentence at all as performing significantly better, and I suspect most people could pretty quickly come up with something.
> I would rate a person who provides no sentence at all as performing significantly better

Why?

> I suspect most people could pretty quickly come up with something

It only takes 60 seconds to test that on yourself. It's not that easy to come up with something of similar length to ChatGPT's answer that also sounds somewhat natural/sensible.

>Why?

For the same reason that "I don't know" is generally a better response than bullshitting.

>It's not that easy to come up with something of similar length to ChatGPT's answer that also sounds somewhat natural/sensible

Those weren't requirements.

> Those weren't requirements.

Then it seems we don't disagree on anything concrete. You're just using a different rating system than me when I judge it as impressive compared to what an average person would produce in 60 seconds.

Not sure if this is a general principle of yours. If ChatGPT were able to write a 1000 word essay using all 5-letter words except for a single mistake, would you still find it unimpressive? Do you think it a tool or person who makes minor mistakes isn't useful? Or only when a tool/person makes major mistakes?

>I would rate a person who provides no sentence at all as performing significantly better

The logic failure in the above statement is probably worse than the logic failure of not being able to spontaneously compose a phrase with just 5-letter words - and slipping in one or two with a higher word-count.

>I suspect most people could pretty quickly come up with something

You'd be very surprised then. Most people fail at even more basic tasks.

Heck, most candidate programmers fail at fizz-buzz (not that more difficult than the above)

>The logic failure in the above statement

And which alleged logic failure is that?

The idea that making a mistake but otherwise fulfilling most of the task is worse than failing to perform any part of it.

Especially in the context of "evaluating the performance of something".

Let's expand this a little to make it even more evident: if the task was "make a paragraph of 100 words using only 5 letter words" and an AI couldn't produce anything at all, whereas another came up with a paragraph of 100 words, except a couple of them had 6 or 4 letters, it would make absolutely no sense to rate the first as "better" than the second in performing the task.

As for understanding the task, the latter exhibits an understanding of it (since it produced a paragraph, and most of the words it used filled the criteria, which wouldn't happen if it chose them randomly), it just made a couple of mistakes (the kind of humans could easily make too in such a task). For the former we can't even be sure if it even understood the task at all.

We don't rate humans that way on performing tasks either (if they got it less than perfect it's worse than not doing it at all). Even math tests at the university level consider the approach and any partial results in the right direction, don't just mark it 0 if there's an error, nor give a higher mark to students who didn't produce anything.

That's wrong.

(An example of a sentence with only five letter words I wrote in less than 60 seconds)

I wasn't clear on how was using "better". Your example is better in that it fulfills the requirement, but I don't think it's as impressive as ChatGPT's answer. How long would it take to make a sentence that is at least 7 words (and also making sense, and ideally sounding good)?
In 5-10 minutes I came up with "Alarm! Naked actor moons queen below (under?) fruit trees, later hides under cheap hotel floor".

Note that I used one of those minutes to get a list of all 4 and 5 letter words, which I'm not sure whether the rules allow or not.

It would take me longer to write an interesting, longer sentence that complied with the rules. But I'd remind you that GPT failed.
"That's" is not one word.
This isn't something that can be usefully discussed. "Word" has a vague enough definition that a contraction can validly be considered one or two words. If you try and look to linguistics you'll just see they use specialized words with stricter definitions.

Regardless it's more reasonable for me to say "that's" is a five letter word than it is for the AI to say "spells" is a five letter word.

I don't think that is true.
I tried, this is what I came up with under significant time pressure:

Happy books sound great.

It was very difficult to think of a plural verb with 5 letters, and once I realized that was an issue, I was worried that I wouldn't have enough time to come up with a singular noun that would fit any of the singular verbs that I was considering (reads, seems).

Interestingly, this is the exact same mistake that ChatGPT made! It has "spell" -> "spells" which is a plurality / correctness of sentence mistake.

My sentence is technically correct and could be used plausibly in conversation: "What kind of books do you want to read?" "Happy books sound great."

But it's a pretty weak sentence. Being restricted from articles makes it very difficult to get agreement.

Or....."I don't think that is true."

;)

Or "See Spot run."

It did get closer. For that type of query you can ask it check its work and can usually triangulate on correct answer within a single prompt, eventually.
I would be cautious of a Clever Hans effect there. If you repeat the question until you get the right answer you're providing the AI with significant extra information.
No, in a single prompt, you can instruct it to check its work and keep going until it’s right (or at least have it tell you which of the N answers were right or wrong). Essentially chain of thought reasoning.