Technically, transformers condition on the entire past not just the last step, but RNNs are Markov Chains. RNNs have information bottleneck issues though.
Transformers (for NLP) also perform steps on Markov chains. The difference is that with transformers (for NLP), which Markov chain it's moving along changes every step.