| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by refulgentis 249 days ago

It's just too far of an analogy, it starts in the familiar SWE tarpit of human brain = lim(n matmuls) as n => infinity.

Then, glorifies wrestling in said tarpit: how do people actually compose sentences? Is an LLM thinking or writing? Can you look into how actors memorize lines before responding?

Error beyond the tarpit is, these are all ineffable questions that assume a singular answer to an underspecified question across many bags of sentient meat.

Taking a step back to the start, we're wondering:

Do LLMs plan for token N + X, while purely working to output token N?

TL;DR: yes.

via https://www.anthropic.com/research/tracing-thoughts-language....

Clear quick example they have is, ask it to write a poem, get state at end of line 1, scramble the feature that looks ahead to end of line 2's rhyme.

1 comments

jsrozner 249 days ago

Let's just not call it planning.

In order to model poetry autoregressively, you're going to need a variable that captures rhyme scheme. At the point where you've ended the first line, the model needs to keep track of the rhyme that was used, just like it does for something like coreference resolution.

I don't think that the mentioned paper shows that the model engages in a preplanning phase in which it plans the rhyme that will come. In fact such would be impossible. Model state is present only in so-far-generated text. It is only after the model has found itself in a poetry generating context and has also selected the first line-ending word, that a rhyme scheme "emerges" as a variable. (Now yes, as you increase the posterior probability of 'being in a poem' given context so far, you would expect that you also increase the probability of the rhyme-scheme variable's existing.)

link

refulgentis 249 days ago

I’m confused: the blog shows they A) predict the end of line 2 using the state at the end of line 1 and B) can choose the end of line 2 by altering state at end of line 1.

Might I trouble you for help getting from there to “such would be impossible”, where such is “the model…plans the rhyme to come”

Edit: I’m surprised to be at -2 for this. I am representing the contents of the post accurately. Its unintuitive for sure, but, it’s the case.

link

froobius 248 days ago

I agree, the post above you is patently wrong / hasn't read the paper they are dismissing. I also got multiple downvotes for disagreeing, with no actual rebuttal.

link

refulgentis 248 days ago

You're my fav new-ish account, spent about 5 minutes Googling froobius yesterday tryna find more content. :) Concise, clear, no BS takes for high-minded nonsense that sounds technical. HNs such a hellhole for LLM stuff, the people who are hacking ain't here, and the people who are, well, they mostly like yapping about how it connects to some unrelated grand idea they misremember from undergrad. Cheers.

(n.b. been here 16 years and this is such a classic downvote scenario the past two years. people overindexing on big words that are familiar to them, and on any sort of challenging tone. That's almost certainly why I got mine, I was the dummy who read the article and couldn't grasp the stats nonsense, and "could I bother you to help" or w/e BS I said, well, was BS)

link

froobius 248 days ago

> Model state is present only in so-far-generated text

Wrong. There's "model state", (I assume you mean hidden layers), not just in the generated text, but also in the initial prompt given to the model. I.e. the model can start its planning from the moment it's given the instruction, without even having predicted a token yet. That's actually what they show in the paper above...

> It is only after the model has found itself in a poetry generating context and has also selected the first line-ending word, that a rhyme scheme "emerges" as a variable

This is an assertion based on flawed reasoning.

(Also, these ideas should really be backed up by evidence and experimentation before asserting them so definitively.)

link