As compute has outpaced memory bandwidth most recent stuff has moved away from ReLU. I think Llama 3.x uses SwiGLU. Still probably closer to ReLU than logistic sigmoid, but it's back to being something more smooth than ReLU.
Indeed, there have been so many new activation functions that I have stopped following the literature after I retired. I am glad to see that people are trying out new things.