| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by srean 184 days ago
	I suggest getting familiar with or brushing up on the differences between a Markov Chain and a Markov Model. The former is a substantial restriction of the latter. The classic by Kemeny and Snell is a good readable reference. MC have constant and finite context length, their state is the most recent k tuple of emitted alphabets and transition probabilities are invariant (to time and tokens emitted)

1 comments

matusp 184 days ago

LLMs definitely also have finite context length. And if we consider padding, it is also constant. The k is huge compared to most Markov chains used historically, but it doesn't make it less finite.

link

srean 184 days ago

That's not correct. Even a toy like an exponential weighted moving averaging produces unbounded context (of diminishing influence).

link

matusp 184 days ago

What do you mean? I can only input k tokens into my LLM to calculate the probs. That is the definition of my state. In the exact way that N-gram LMs use N tokens, but instead of using ML models, they calculate the probabilities based on observed frequencies. There is no unbounded context anywhere.

link

srean 184 days ago

That's different.

You can certainly feed k-grams one at a time to, estimate the the probability distribution over next token and use that to simulate a Markov Chain and reinitialize the LLM (drop context). In this process the LLM is just a look up table to simulate your MC.

But an LLM on its own doesn't drop context to generate, it's transition probabilities change depending on the tokens.

link