Hacker News new | ask | show | jobs
by Grimm1 1992 days ago
Personally, I implemented this just yesterday.

https://arxiv.org/pdf/1703.03130.pdf

It's a bit older now but I was looking for a self attention method without resorting to a transformer model and this proposed an interesting implementation that wound up being very successful for my problem case.