Hacker News new | ask | show | jobs
by reic 1003 days ago
Quantification can be done by measuring in at least two dimensions: (1) the size of the synthesised code, and (2) how precisely the generated code matches the input (which means roughly: on what fraction of input do the two programs give different output). We have set up a challenge that seeks to entice the community to look into this problem domain more. And we've simplified the assumptions, so as to make it more tractable:

- Challenge: https://codalab.lisn.upsaclay.fr/competitions/15096

- Paper describing the challenge: https://arxiv.org/abs/2308.07899

(I am one of the authors, AMA)

1 comments

How well does (2) really measure accuracy? It seems like a single output that doesn't match the input code could indicate a fundamental floor in the optimized code, so it's essentially 100% wrong even though it gets the correct answer almost all the time.

Good luck on the challenge though, this seems like an interesting and valuable area of research.

Of course (2) is not a perfect measure of accuracy, since it does not quantify how far wrong an output is, e.g. if 111111111111 is the correct output, then both 111111111110 and 829382934783 count as equally faulty. The main advantage of (2) is that it is natural, easy to understand, and easy to measure and compare. We have to start somewhere. I imagine that, in the future, it can be refined (e.g. taking the Hamming distance between desire and actual output). I expect that more refined quantification emerges in response to the community better understanding exactly what is hard in the synthesis of programs.

Feel free to submit something! A simple submission is probably just a few lines of code.