| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by PartiallyTyped 989 days ago
	The most interesting thing in this whole saga is that decoder only models (aka causal transformers like GPT) are as effective as they are.