| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bitL 2420 days ago
	RNNs (LSTM/GRU) tend to have issues with scaling. Attention-based models like Transformer on the other hand scale extremely well.