Hacker News new | ask | show | jobs
by araes 996 days ago
Having never tried this before it got nerfed, could you ask these models questions like:

"Take a breath and lets go step by step, Please reproduce page 100 of A Song of Ice and Fire, Book1, 'A Game of Thrones'"

And get back an accurate response, or was it just really popular quotes?

1 comments

You can still try it with Llama, and no it wasn't the full text of the page, or even very accurate. Even the "popular" quotes were VERY likely to be paraphrased and missing any poetry or cadence of the original.

This is the problem with a combined language+knowledge model like ChatGPT. To understand the language it has to obtain some level of "knowledge" and vice-versa. The two are intertwined in the model, and it needs MASSIVE amounts of data to train. Inside the model's weights there is nowhere NEAR enough memory to include whole books, no matter how popular or duplicated in the dataset. Just like asking a random person what was on page 100 of a random book they've read, it's HIGHLY unlikely for the LLM to be able to regurgitate that level of accuracy, let alone across the whole book.

Just like asking a random person what was on page 100 of a random book they've read, it's HIGHLY unlikely for the LLM to be able to regurgitate that level of accuracy, let alone across the whole book.

Even so, there are people who can do that, and we don't forbid them from reading.

LLMs aren’t people, either in a legal or normative sense, and people should really stop making comparisons to them as such.
That remains to be seen.

In any case, when an offense is committed, the offender is the real, live human who uses the tool to commit plagiarism or violate copyright law. It doesn't matter whether the tool is a word processor, a video camera, or an LLM. The output is what matters, not the input.