For the self-attention mechanism, memory bandwidth requirements scale ~quadratically with the sequence length.