|
|
|
|
|
by bravura
1047 days ago
|
|
Like a lot of research, unless there’s a clear explanation supported by rigorous study, they probably randomly hillclimbed a bunch of cool new one liner changes and stopped when it was time to start writing the paper and doing ablation studies. |
|
It's fine, I waited a bit before default adopting Relu over Tanh for all hidden non-final (not outputting a probability) layers.