| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cma 591 days ago
	As compute has outpaced memory bandwidth most recent stuff has moved away from ReLU. I think Llama 3.x uses SwiGLU. Still probably closer to ReLU than logistic sigmoid, but it's back to being something more smooth than ReLU.

1 comments

2sk21 590 days ago

Indeed, there have been so many new activation functions that I have stopped following the literature after I retired. I am glad to see that people are trying out new things.

link