| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ToValueFunfetti 29 days ago
	What I'm saying is that this is incorrect. An "idea" exists within a model before it generates tokens. This property does not distinguish humans from LLMs. Additionally "from learned stats" doesn't disambiguate between a wider variety of things. I'm not aware of any other way to acquire knowledge from measurements. I'd bet that humans do this differently, based on the fact the humans can get further with less training data and that they learn actively during operation, but not so differently that 'learning stats' would be an inaccurate description.

1 comments

RaftPeople 29 days ago

> What I'm saying is that this is incorrect. An "idea" exists within a model before it generates tokens.

If that were the case, then the systems would generate words based on the fully resolved idea, but that is not how the LLM systems currently work (per vendors descriptions).

They choose words sequentially and both the specifics of the input as well as the chosen output words significantly impacts not just the rest of the output but the very correctness of the output.

> but not so differently that 'learning stats' would be an inaccurate description.

Agreed, humans are generalizing using some mechanism that can be modeled with math.

But the execution of our reasoning and thought processes is not obviously similar to LLM's next word generation based on probabilities.

ToValueFunfetti 29 days ago

>that is not how the LLM systems currently work (per vendors descriptions)

Anthropic says of the their model[0]:

"""Claude sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal “language of thought.”

{...}

Claude will plan what it will say many words ahead, and write to get to that destination. We show this in the realm of poetry, where it thinks of possible rhyming words in advance and writes the next line to get there. This is powerful evidence that even though models are trained to output one word at a time, they may think on much longer horizons to do so."""

Anthropic also created 'golden gate claude'[1] by identifying the region of its architecture that corresponded to the concept of the golden gate bridge and activating it. What would such a region exist for if claude could only think one token at a time?

>the execution of our reasoning and thought processes is not obviously similar to LLM's

"Not obviously similar" I can agree with. I don't think you've identified a way in which they are obviously different, though.

[0] https://www.anthropic.com/research/tracing-thoughts-language...

[1] https://www.anthropic.com/news/golden-gate-claude