Hacker News new | ask | show | jobs
by sebzim4500 1135 days ago
What's an LMM? I've never seen the term.

> Comparing weighted next-word-engines to feeling, thinking, aware beings is insulting

Why is it reasonable to be so reductionist about e.g. GPT-4 but not be so reductionist about a biological brain? E.g., why can't I say that your brain is nothing but a bunch of biological neurons trained using its input and intialized based on your genetics? It's equally true, and equally missing the point.

2 comments

I think that machine learning probably can produce something akin to a brain, but LLMs are not really it even if they use the digital equivalent of a neuron. As much as I understand what I read about LLMs they really seem to be descendants of Markov chains. I think they are valuable and can go a long way, but LLMs themselves will not be "it". I think that we will get to a ceiling with them within 10 years if we will not think about something else. I think the ceiling can be made pretty high though.

However most probably in 10 years we will all laugh how all of our predictions missed by a long shot.

LMM = Large Markov Model. I use that term because models like GPT-4 and friends are for all intents and purposes Markov chains with more data, more compute, some lossy compression, and a bit of nearest neighbor search. Next-word-engines.

> why can't I say that your brain is nothing but a bunch of biological neurons trained using its input and intialized based on your genetics?

Because we don't think one word at a time, and we don't restart from scratch for every subsequent word.

>Because we don't think one word at a time

In what sense does an LLM think one word at a time that doesn't also apply to a person typing at a keyboard? I'm typing one word at a time right now, I assume you aren't about to declare me a markov chain. When I read my brain presumably ingests one word at a time (not sure if it's one exactly, but it can't be much more than one). It is of course true that I have some notion of what I'm going to say before I right the first word, but seemingly so does an LLM.

If it was truly thinking one word at a time, it wouldn't be able to consistently use 'an' vs 'a' correctly, for example.

>we don't restart from scratch for every subsequent word.

LLMs don't restart from scratch for every word, via the attention heads they can look back through the entire context. Otherwise the memory required for inference wouldn't scale with the context length.

> In what sense does an LLM think one word at a time that doesn't also apply to a person typing at a keyboard?

Because you already have the thought formed before you started typing.

> When I read my brain presumably ingests one word at a time (not sure if it's one exactly, but it can't be much more than one)

And these models ingest many vectors at once, up to the context length. Your brain is also recursive, and regularly goes backwards to rescan earlier words as necessary.

Seems to me it's fundamentally inverted from how we operate, both input and output.

>Because you already have the thought formed before you started typing.

Can you prove that GPT-4 doesn't? Clearly there is a sense in which thinks more than one word ahead, since as I mentioned above it would not otherwise be able to use 'a' vs 'an' correctly.

As far as I am aware, exactly to what extent these models have determined what tokens will be generated before they produce anything is an open question in mechanistic interpratability research. I would be very interested if you knew of some work that answers this question empirically.

GPT is not Markovian; it has state.
Then it's a markov-like with state. Or as I've taken to calling them lately Markov+state. (I couldn't resist, sorry.)

A truck towing a trailer isn't just a car because it pivots in the middle and has more wheels. It's fundamentals of operation are still closer to a car or truck without trailer than a bicycle.

Humans can form thoughts and get to mostly correct answers even as a gut feeling, and the language to explain why/how need not even be present. We don't form thoughts one word at a time.

No it is not Markov-like. GPT models are not Markov processes by definition. They take into account all previous words in the sequence when generating the next word. They have a type of memory in the form of an attention mechanism that refers to multiple previous states when generating tokens.

They are not human-like and they are not Markov-like. GPT is a separate category.