Hacker News new | ask | show | jobs
by Inufu 1530 days ago
On many natural language tasks there can be significant overlap, making it difficult to judge performance. That's why I like more complex code generation tasks such the dataset we used for AlphaCode.