| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by srean 181 days ago
	That's not correct. Even a toy like an exponential weighted moving averaging produces unbounded context (of diminishing influence).

1 comments

matusp 181 days ago

What do you mean? I can only input k tokens into my LLM to calculate the probs. That is the definition of my state. In the exact way that N-gram LMs use N tokens, but instead of using ML models, they calculate the probabilities based on observed frequencies. There is no unbounded context anywhere.

link

srean 181 days ago

That's different.

You can certainly feed k-grams one at a time to, estimate the the probability distribution over next token and use that to simulate a Markov Chain and reinitialize the LLM (drop context). In this process the LLM is just a look up table to simulate your MC.

But an LLM on its own doesn't drop context to generate, it's transition probabilities change depending on the tokens.

link