|
|
|
|
|
by ulber
974 days ago
|
|
>I think we need something that at least captures the general mechanism by which LLMs predict the next token. This bit varies a lot since the capabilities involved in prediction depend on the data. If the text is a math book and the prompt is "... three plus five apples is a total of " the crucial capability is arithmetic (plus of course NL capabilities). On the other hand, if you're completing a post from /r/relationship_advice the capabilities involved will be (vaguely) maintaining literary tone, theory of mind, psychology, etc. Within a text the capabilities needed will also vary a lot, where you might need theory of mind at some crucial inflection points, but most of the time its already clear what is going to be said and the model just has to get the wording right. So, my take would be to really think hard about the data to understand how predictions might be made. |
|