Transformers famously employ the Softmax activation inside the attention matrix. Very rare to see Softmax anywhere other than the final layer.