| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by octbash 2361 days ago
	Those are question-answering and language-understanding benchmarks respectively, neither of which has been suitable for language generation mode evaluation since GPT-1 was roundly beating by BERT. GPT-2 didn't evaluate on them either.