Hacker News new | ask | show | jobs
by ghysznje 494 days ago
(Probably bc I'm dumb) I'm very confused by this paper. The dimensions are all over the place: first they say M is a N x d x d matrix, then it becomes N x d. And then they are trying to scale M with g_out and add it to E_attn which is a T x d matrix??? Are the gates scalars or vectors or matrices? If they are matrices then the dimensions also don't line up to M
1 comments

You're not dumb. I think it's just poorly written and full of errors.