Hacker News new | ask | show | jobs
by visarga 1034 days ago
Technically, transformers condition on the entire past not just the last step, but RNNs are Markov Chains. RNNs have information bottleneck issues though.
1 comments

Transformers (for NLP) also perform steps on Markov chains. The difference is that with transformers (for NLP), which Markov chain it's moving along changes every step.