| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sankha93 2200 days ago

This article is a classic example of Facebook PR doing a wonderful job of selling the research and linked paper [1] claiming too much in the introduction. Please, please talk to actual researchers before you buy such claims.

If you go through the paper - you have to check the evaluation section to see how they measured their success. They used some programs from GeeksforGeeks to evaluate their approach. Problems on GeeksforGeeks do not represent the vast majority of programming tasks encountered in daily life. This is very much in contrast to the overarching claims presented in the introduction of the paper.

Second issue with the evaluation: they use BLEU scores to judge how good their translations are. BLEU makes sense for natural language translations (even that is widely debated in the NLP community these days). For a program there is no concept of an almost correct program (based on how things look similar), it is either correct or not. Eg. if I am asked to write a program to add two numbers and I write `x - y`, I am not almost correct, I am completely wrong. And in some ways that is what their model does, it optimizes for BLEU scores.

Third, the correctness of the programs are tested based on 10 random inputs. Are 10 random inputs enough to cover the entire input space that can be accepted by a program?

It is indeed a great advance in the application of ML technology, but it is nowhere close to the broader claims. One can even debate, ROI on time spent in gathering and curating data and then checking the correctness of translation from such system vs the ROI on writing rules for a rule based system since all programming languages are easily expressible that way.

[1]: https://arxiv.org/pdf/2006.03511.pdf