Y
Hacker News
new
|
ask
|
show
|
jobs
by
gdiamos
456 days ago
Sure, but why would one prefer tanh instead of normalization layers if they have the same accuracy?
I suppose normalization kernels have reductions in them, but how hard are reductions in 2025?