|
|
|
|
|
by eden-u4
552 days ago
|
|
that's a wrong simile given that you would get the same end result in both cases. I'm not using a transformer, just a plain Feedforward, Relu and dropout for a simple classifier. I don't know, I can be wrong. I hope and some toy experiment shows that even in low case parameters it works fine as well as adam. |
|