|
|
|
|
|
by hospitalhusband
1135 days ago
|
|
LMM = Large Markov Model. I use that term because models like GPT-4 and friends are for all intents and purposes Markov chains with more data, more compute, some lossy compression, and a bit of nearest neighbor search. Next-word-engines. > why can't I say that your brain is nothing but a bunch of biological neurons trained using its input and intialized based on your genetics? Because we don't think one word at a time, and we don't restart from scratch for every subsequent word. |
|
In what sense does an LLM think one word at a time that doesn't also apply to a person typing at a keyboard? I'm typing one word at a time right now, I assume you aren't about to declare me a markov chain. When I read my brain presumably ingests one word at a time (not sure if it's one exactly, but it can't be much more than one). It is of course true that I have some notion of what I'm going to say before I right the first word, but seemingly so does an LLM.
If it was truly thinking one word at a time, it wouldn't be able to consistently use 'an' vs 'a' correctly, for example.
>we don't restart from scratch for every subsequent word.
LLMs don't restart from scratch for every word, via the attention heads they can look back through the entire context. Otherwise the memory required for inference wouldn't scale with the context length.