Hacker News new | ask | show | jobs
by derwiki 1157 days ago
I uploaded a 750 page novel and asked to summarize the plot, to which it responded:

> I'm sorry, but I cannot provide a summary of the plot of this book as the PDF file does not contain any information about the plot. The PDF file only includes the cover design, copyright information, table of contents, and details about the author. It is possible that the book itself contains a summary or synopsis of its plot.

Is there a data truncation issue?

3 comments

Hm, and then I asked "in 3-4 paragraphs, what is this book about?", it summarized the _previous_ book by this author, that came out before the ChatGPT training cut off. I specifically chose a sample novel that was released in late 2022 to check that this wasn't just using general ChatGPT training and was actually using the PDF I uploaded.
It is returning a series of snippets from the book, from which it cannot create a summary. That’s because of the way vectorized search works.

Summarizing the book requires a different approach. Usually condensing the book, maybe processing ten pages at a time, and then summarizing the condensed chunks.

This is a good questions that should really be answered with a FAQ section.

Summarization is not something that Document Q&A is meant for. "Chat with your doc" = Q&A. A question is embedded along with every paragraph in the document to find a similarity match. Unless there is a paragraph discussing a word related to "plot" it will not have a useful answer. And as you found below, it is more than capable of hallucinating an answer outside the document (because it was not prompted properly to ONLY answer using the context of the document).