Y
Hacker News
new
|
ask
|
show
|
jobs
by
Inufu
1530 days ago
On many natural language tasks there can be significant overlap, making it difficult to judge performance. That's why I like more complex code generation tasks such the dataset we used for AlphaCode.