| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pilotneko 1823 days ago
	It is being used in the reformer architecture to relate the dot product attention layer. This enables much longer input sequences and reduces computational complexity from O(n^2) to O(nlog(n)), where n in the length of the sequence. https://huggingface.co/transformers/model_doc/reformer.html