| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jimsimmons 1187 days ago
	GP is wrong. Attention is all you need paper just proposed an AR model that didn’t have to be trained step by step. The scaling happened later in BERT and GPT and OpenAI’s scaling work