| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by danpalmer 577 days ago

Thanks for the response, it does sound like you've seen similar treatment of LLMs by others to what I've observed.

I think though that an important part of communicating about LLMs is talking about what they are designed to do and what they aren't. This is important because humans want to anthropomorphise, and LLMs are way past good enough for this to be easy, but similar to pets, not being human means they won't live up to expectations. While your findings show that current large models are quite good at verbatim answers (for one of the most widely reproduced texts in the world), this is likely in no large part down to luck and the current way these models are trained.

My concern is that the takeaway from your article is somewhere between "most models reproduce text verbatim" and "large models reproduce popular text verbatim", where it should probably be that LLMs are not designed to be able to reproduce text verbatim and that you should just look up the text, or at least use an LLM that cites its references correctly.

2 comments

filoeleven 575 days ago

Check out this video at the 22:20 mark. The goal he’s pursuing is to have the LLM recognize when it’s attempting to make a factual statement, and to quote its training set directly instead of just going with the most likely next token.

https://youtu.be/b2Hp0Jk9d4I?si=SwfKJrck5_0LPgTH

link

nyclounge 577 days ago

>or at least use an LLM that cites its references correctly.

Are there implementation where LLM just out put the text of the references (or the first 100 words). I'm sure someone has implemented already?

link

danpalmer 577 days ago

Gemini cites web references, NotebookLM cites references in your own material, and the Gemini APIs have features around citations and grounding in web search content. I'm not familiar with OpenAI or Anthropic's APIs but I imagine they do similar, although I don't think ChatGPT cites content.

All these are doing however is fact-checking and linking out to those fact-checking sources. They aren't extracting text verbatim from a database. You could probably get close with RAG techniques, but you still can't guarantee it in the same way that if you ask an LLM to exactly repeat your question back to you, you can't guarantee that it will verbatim.

Verbatim reproduction would be possible with some form of tool use, where rather than returning, say, a bible verse, the LLM returns some structure asking the orchestrator to run a tool that inserts a bible verse from a database.

link