| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sdenton4 1163 days ago
	That's the point of having multiple realizations of the same underlying model. The (depthwise) convolutional realization is extremely efficient for training, and the RNN is extremely efficient for inference. The scaling in both of these cases is much better than attention layers - as they discuss in the article.