| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by GardenLetter27 9 days ago
	It's not just the architecture but also the data - the decoder only approach lets you train in parallel over blocks of text (no RNN serial waiting), that allows you train on much, much more data.