Hacker News new | ask | show | jobs
by netruk44 529 days ago
I wonder if we could combine ‘thinking’ models (which write thoughts out before replying) with a mechanism they can use to check their own entropy as they’re writing output.

Maybe it could eventually learn when it needs to have a low entropy token (to produce a more-likely-to-be-factual statement) and then we can finally have models that actually definitely know when to say “Sorry, I don’t seem to have a good answer for you.”

2 comments

There's a paper that probed how strongly a model would focus on prompt-supplied tokens when generating a response as a signal that it was trying to use the prompt as the source of information as opposed to knowledge it had been trained on. Ie, how much it was trying to lie based on it assuming that the information in the prompt was true, as opposed to having a rich internal model of the thing that is being verified. It looks like it works, sort of, sometimes, when you have access to the actual labels. The results from this work, in the more real-world unsupervised setting, are better than random, sure, but not good enough to really be exciting or reliable.

https://arxiv.org/html/2402.03563v1

Entropix will get it's time in the sun, but for now, the LLM academic community is still 2 years behind the open source community. Min_p sampling is going to end up getting an oral about it at ICLR with the scores it's getting...

https://openreview.net/forum?id=FBkpCyujtS

> the LLM academic community is still 2 years behind the open source community

Huh, isn't it the other way around? Thanks to the academic (and open) research about LLMs, we have any open source community around LLMs in the first place.