Hacker News new | ask | show | jobs
by eden-u4 552 days ago
that's a wrong simile given that you would get the same end result in both cases.

I'm not using a transformer, just a plain Feedforward, Relu and dropout for a simple classifier.

I don't know, I can be wrong. I hope and some toy experiment shows that even in low case parameters it works fine as well as adam.