|
|
|
|
|
by benkaiser
528 days ago
|
|
To add a little context here, I (the author) understand LLMs aren't the right tool (at least by themselves) for verbatim verse recall. The trigger for me doing the tests was seeing other people in my circles blindly trusting that ChatGPT was outputting verses accurately. My background allowed me to understand why that is a sketchy thing to do, but many people do not, so I wanted to see how worried we really should be. |
|
I think though that an important part of communicating about LLMs is talking about what they are designed to do and what they aren't. This is important because humans want to anthropomorphise, and LLMs are way past good enough for this to be easy, but similar to pets, not being human means they won't live up to expectations. While your findings show that current large models are quite good at verbatim answers (for one of the most widely reproduced texts in the world), this is likely in no large part down to luck and the current way these models are trained.
My concern is that the takeaway from your article is somewhere between "most models reproduce text verbatim" and "large models reproduce popular text verbatim", where it should probably be that LLMs are not designed to be able to reproduce text verbatim and that you should just look up the text, or at least use an LLM that cites its references correctly.