Hacker News new | ask | show | jobs
by ulber 974 days ago
>I think we need something that at least captures the general mechanism by which LLMs predict the next token.

This bit varies a lot since the capabilities involved in prediction depend on the data. If the text is a math book and the prompt is "... three plus five apples is a total of " the crucial capability is arithmetic (plus of course NL capabilities). On the other hand, if you're completing a post from /r/relationship_advice the capabilities involved will be (vaguely) maintaining literary tone, theory of mind, psychology, etc. Within a text the capabilities needed will also vary a lot, where you might need theory of mind at some crucial inflection points, but most of the time its already clear what is going to be said and the model just has to get the wording right.

So, my take would be to really think hard about the data to understand how predictions might be made.

2 comments

The way LLMs process text is a lot like a typical human (top-of-the-head answer vs actually analyzing the question in depth). This is a recurring pattern with the current LLMs I've interacted with (ex., A: "x" Q: "but what about y?" A: "oh, you're right, now that I [actually] think about it, really z"). Example (Anthropic's Claude):

    Q: if one basket contains 5 apples and another basket contains 3 plums, and another basket contains 2 walnuts how many fruits are there combined?
    A: The first basket contains \*5 apples\*. The second basket contains \*3 plums\*. The third basket contains \*2 walnuts\*. Adding up the numbers of fruits in each basket, there are a total of \*5 + 3 + 2 = 10 fruits combined\*.
    Q: is a walnut a fruit?
    A: You're right, a walnut is not technically a fruit. A walnut is a type of tree nut. To correctly answer the original question:
    The first basket contains 5 apples. The second basket contains 3 plums. While the third basket contains 2 walnuts, walnuts are not fruits. So the total number of fruits combined is 5 apples + 3 plums = 8 fruits.
An interesting fundamental limitation of transformer LLMs is that their architecture doesn't allow for the option for the model to choose to think a bit more or to choose to review their answer, as the quantity and structure computations they'll do is fixed in the architecture without any conditionals - but you repeating the same prompt means that the model gets to run a second iteration while having the result of the first iteration as the input data, and that can enable it to do more.

Of course, if we'd want to generalize it, we should have a structure where the model itself can actually make the decision to continue work for one more iteration, and also ensure that it can retain some of the temporary work-in-progress notes (currently it can see only what the previous iteration output as part of the answer).

There's an interesting new paper about this problem: https://arxiv.org/abs/2310.02226

"Think before you speak: Training Language Models With Pause Tokens"

Basic idea is to teach the LLM to occasionally insert a "pause" token, which outputs nothing but gives it a chance to perform another round of operations on the way to the answer.

"three plus five apples is a total of..." is a really interesting example, because it doesn't actually require arithmetic at all.

A language model trained on enough text will be able to complete this just based on having encountered the pattern "three plus five SOMETHING is a total of..." enough times in its training data.

This becomes even more apparent when you work with smaller models - the 7B etc models which can run on a laptop. They can often solve small arithmetic problems like this while having no chance at all of working with larger numbers that they haven't ever encountered in their training data.

I really like using those smaller models as tools to better understand how this technology works.