| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Inufu 1577 days ago
	On many natural language tasks there can be significant overlap, making it difficult to judge performance. That's why I like more complex code generation tasks such the dataset we used for AlphaCode.