Hacker News new | ask | show | jobs
by pilotneko 1823 days ago
It is being used in the reformer architecture to relate the dot product attention layer. This enables much longer input sequences and reduces computational complexity from O(n^2) to O(nlog(n)), where n in the length of the sequence.

https://huggingface.co/transformers/model_doc/reformer.html