Hacker News new | ask | show | jobs
by dist-epoch 548 days ago
Was the toy model a transformer?

Maybe it's just way too small, you wouldn't use Karatsuba multiplication to do 3*5.

1 comments

that's a wrong simile given that you would get the same end result in both cases.

I'm not using a transformer, just a plain Feedforward, Relu and dropout for a simple classifier.

I don't know, I can be wrong. I hope and some toy experiment shows that even in low case parameters it works fine as well as adam.