Hacker News new | ask | show | jobs
by oneearedrabbit 1128 days ago
I think in your notation it should have been:

y=Wx_0

y=W(x)x_0

1 comments

I guess I was more thinking about self attention, so yes. The more general case is covered by your notation!