| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by spencerchubb 660 days ago

it's not "just" model error

during pre-training, there is never an incentive for the model to say "I don't know" because it would be penalized. the model is incentivized to make an educated guess

large transformer models are really good at approximating their dataset. there is no data on the internet about what LLMs know. and even if there were such data, it would probably become obsolete soon

that being said, maybe a big shift in the architecture could solve this. I hope!

3 comments

happypumpkin 660 days ago

> it would probably become obsolete soon

Suppose there are many times more posts about something one generation of LLMs can't do (arithmetic, tic-tac-toe, whatever), than posts about how the next generation of models can do that task successfully. I think this is probably the case.

While I doubt it will happen, it would be somewhat funny if training on that text caused a future model to claim it can't do something that it "should" be able to because it internalized that it was an LLM and "LLMs can't do X."

link

spencerchubb 660 days ago

also presumes that the LLM knows it is an LLM

link

adwn 660 days ago

System prompts sometimes contain the information that "it" is an LLM.

Maybe in the future, those prompts will include motivational phrases, like "You can do it!" or "Believe in yourself, then you can achieve anything."

link

Vecr 660 days ago

They're generally fine tuned not to. I'm not sure how long that will hold though.

link

ykonstant 658 days ago

- Are you an LLM?

- As a Large Language Model, I am fine tuned to be unable to answer this question.

link

singularity2001 660 days ago

in another paper which popped up recently they approximated uncertainty with Entropy and inserted "wait!" tokens whenever Entropy was high, simulating chain of thought within the system.

link

spywaregorilla 660 days ago

> during pre-training, there is never an incentive for the model to say "I don't know" because it would be penalized. the model is incentivized to make an educated guess

The guess can be "I don't know". The base LLM would generally only say I don't know if it "knew" that it didn't know, which is not going to be very common. The tuned LLM would be the level responsible for trying to equate a lack of understanding to saying "I don't know"

link