| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by solomatov 1182 days ago
	As far as I remember in RNN times, the best models were RNNs with attention. Does this thing has any attention mechanism? If it does, then it has the same problem with the O(n^2) computation where n is the window size. My understanding is that transfers are superior due to the fact that they are much faster to train/evaluate than RNNs.