|
|
|
|
|
by Grimm1
1992 days ago
|
|
Personally, I implemented this just yesterday. https://arxiv.org/pdf/1703.03130.pdf It's a bit older now but I was looking for a self attention method without resorting to a transformer model and this proposed an interesting implementation that wound up being very successful for my problem case. |
|