| HN Mirror

It'll handle the simple cases amazingly, and will handle edge-cases by producing wrong code: hopefully obviously-wrong, but subtly-wrong in at least some cases. A prompt will be written and honed and evolved, and tooling will be built to post-process GPT-4's output, and so the accuracy will rise – but still with no correctness guarantees.

When it goes wrong, the advice will include "write better comments, so the transpiler knows what you're doing". Proponents will liken this to type-linting comments. Critics will liken this to INTERCAL / p-hacking / tax fraud, and will claim that the transpiler can be mislead by confusing comments. Proponents will show you that GPT-4 can identify misleading comments in the critics' examples. Critics will say "real code won't contain comments like that, so this ability is useless". Proponents will say "oh, yeah, that too I guess". Critics will promptly vanish in a puff of logic.

The manually-written tool will get better: more slowly at first, but more steadily, and with only a few (predictable, fixable) correctness bugs. Eventually, it will be able to correctly process more programs than the leading GPT-4 approach can. It will be months before anyone notices this, since the two camps (manual approach, GPT-4 approach) will not really be talking to each other enough.

Eventually, somebody will write a blog post about a semi-obscure but representative benchmark (perhaps the Linux kernel), pointing out that the manual tool works better now. There will be a brief wave of hype about the "new tool" and the "death of AI". Then some people will fine-tune the model on tricksy cases using the manually-written tool's output, some other people will call that utterly cheating, and the hype will give way to bickering.

Realistic? Well… GPT-4 is proprietary, and we've got more efficient LLM architectures now – but I think the sorts of people to make a tool like this will probably stick with OpenAI's APIs. (It's In The Cloud.™)