|
|
|
|
|
by spencerchubb
613 days ago
|
|
it's not "just" model error during pre-training, there is never an incentive for the model to say "I don't know" because it would be penalized. the model is incentivized to make an educated guess large transformer models are really good at approximating their dataset. there is no data on the internet about what LLMs know. and even if there were such data, it would probably become obsolete soon that being said, maybe a big shift in the architecture could solve this. I hope! |
|
Suppose there are many times more posts about something one generation of LLMs can't do (arithmetic, tic-tac-toe, whatever), than posts about how the next generation of models can do that task successfully. I think this is probably the case.
While I doubt it will happen, it would be somewhat funny if training on that text caused a future model to claim it can't do something that it "should" be able to because it internalized that it was an LLM and "LLMs can't do X."