Hacker News new | ask | show | jobs
by lmeyerov 1056 days ago
This seems like an early paper and agreed with the consternation.

The paper, in a modern context and based solely on the abstract and having been in the community, is chipping at the "uninteresting" part of the problem. Around that time, program synthesis started switching to SMT (satisfiability modulo theory) methods, meaning basically a super powerful & general SAT solver for the broad search ("write a wild python program") and then, for specialized subdomains, have a good way to call out to optimized domain solvers ("write a tiny bit of floating point math here"). The paper would solve what the regex callout looks like.. which is specialized. We can argue regex is one of the most minimal viable steps towards moving to general programming on GPUs. Except as a person who does SIMD & GPU computing, optimizing compute over finite automata is not general nor representative and I don't expect to change my thinking much about the broader computational classes. To be fair to the authors... back then, synthesizing regex & sql were hard in practice even for boring cases.

Separately, nowadays synthesis has shifted to neural (copilot, gpt), and more interesting to me, neurosymbolic in R&D land. We're doing a lot of (simple) neurosymbolic in louie.ai, and I'm excited if/when we can get the SMT solver side in. Making GPT call Z3 & Coq were some of the first programs I tried with it :) Till then, there's a lot more interesting low-hanging fruit from the AI/ML side vs solvers, but feels like just a matter of time.

1 comments

The paper claims that there is no neural / deep learning based solver that performs well on regular expression inference.

Calls to Coq and Z3 will be very slow and not competitive with GPU compute.

Ironically, one of its few comparisons was FlashFill / Excel.. which had been going the way of NNs and now LLMs for years now. The paper largely skips measured and non-superficial comparisons, so I wouldn't draw comparative conclusions based on their writeup.

RE Coq/Z3 vs GPUs, I think about them as different kinds of things, and thus wouldn't compare directly. A solver like z3 encapsulates search tricks for tasks where brute force approaches would effectively run forever. Their tricks are symptomatically necessary -- effective GPU acceleration requires finding a clean data parallel formulation, and if a more naive algorithm is picked for that GPU friendliness, but at the expense of inflating the workload asymptotically, the GPU implementation would have high utilization, and high speedups vs a CPU version... but would still be (much) worse than Z3. Instead of comparing them as inherently different approaches, the question is if the practically relevant core of Z3 can be GPU accelerated while preserving its overall optimization strategies.

Which takes us back to the broader point of this thread of GPUs for synthesis... LLMs are successfully GPU accelerating program synthesis of more than just regex. It's amazing that half of github code now uses them! Closer to this paper, folks are making inroads on GPU accelerating SMT solvers that the more powerful (classical) program synthesizers use, e.g., https://github.com/muhos/ParaFROST .

Could you point towards a paper that does precise and minimal regular expression (or finite state automaton) inference?
That's a great yet narrow request, which is great for POPL and less for PLDI. As a reference, I believe starcoder does worse than GPT4 and has taken time to publish results on program synthesis that subsumes regex. The flashfill team has academic PL roots and would also be good to compare to, as I mentioned.

With prompt engineering being so easy, and naive search/refinement over them so easy too, not comparing via benchmarks, vs brush-off related work section, to these general solvers seems strange.