Y
Hacker News
new
|
ask
|
show
|
jobs
by
in-silico
46 days ago
I wonder how different their method actually is from other sub-quadratic sparse attention methods like Reformer [1] and Routing Transformer [2].
[1]:
https://arxiv.org/abs/2001.04451
[2]:
https://arxiv.org/abs/2003.05997