| HN Mirror

It definitely knows what GTK4 is, when it freaked out on me and lost the code, it was using all gtkmm-4.0 headers, and had the compiler error count down to 10 (most likely with tons of logic errors, but who knows).

But LLMs performance varies (and this is a huge critique!) not just on what they theoretically know, but how, erm, cross-linked it is with everything else, and that requires lots of training data in the topic.

Metaphorically, I think this is a little like the difference for humans in math between being able to list+define techniques to solve integrals vs being able to fluidly apply them without error.

I think a big and very valid critique of LLMs (compared to humans) is that they are stronger at "memory" than reasoning. They use their vast memory as a crutch to hide the weaknesses in their reasoning. This makes benchmarks like "convert from gtkmm3 to gtkmm4" both challenging AND very good benchmarks of what real programmers are able to do.

I suspect if we gave it a similarly sized 2kloc conversion problem with a popular web framework in TS or JS, it would one-shot it. But again, its "cheating" to do this, its leveraging having read a zillion conversion by humans and what they did.