|
|
|
|
|
by quag
872 days ago
|
|
Is the author claiming that LLMs are Markov Chain text generators? That is, the probability distribution of the next token generated is the same as the probability of those token sequences in the training data? If so, does it suggest we could “just” build a Markov Chain using the original training data and get similar performance to the LLM? |
|