|
|
|
|
|
by ravi-delia
1284 days ago
|
|
I look at deep sequences of tokens and predict what comes next- can you milk me? Once you've broadened "basically a markov chain" to "any function from a sequence of tokens to a probability distribution of tokens" there's a lot of explanatory power lost. If you had to characterize the difference between brute force mappings based on pure frequencies and model which selectively calculates probabilities based on underlying structure, wouldn't you say the latter had more complexity? You don't have to believe the hype, but if you think you can get GPT performance out of anything remotely resembling a markov chain, I encourage you to try. |
|
Obviously that's impractical and not how LLMs actually work - they derive the transition probabilities for a state from the input, rather than having it pre-baked - but I think from the point of view of saying 'these are more sophisticated than a Markov chain', actually strictly speaking they aren't - they are in fact a lossy compression of a Markov model.