Y
Hacker News
new
|
ask
|
show
|
jobs
by
pilotneko
1823 days ago
It is being used in the reformer architecture to relate the dot product attention layer. This enables much longer input sequences and reduces computational complexity from O(n^2) to O(nlog(n)), where n in the length of the sequence.
https://huggingface.co/transformers/model_doc/reformer.html