| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by phowon 2686 days ago
	The BERT paper also introduced BERT Base, with is 12 layers with approximately the same number of parameters as GPT, but still outperforms GPT on GLUE.