| HN Mirror

It appears to be purpose-trained for coding. They also have a generalist model, but that's not the one being compared.

I agree, the comparison is dated, cherry-picked and doesn't reference the thinking models people do use for coding.

But it's also a bit of a new architecture in early stages of development/testing. Comparing against other small non-thinking models is a good step. It demonstrates the strategy is viable and worth exploring. Time will tell its value. Perhaps a guiding LLM could lean on diffusion to speed up generation. Perhaps we'll see more mixed-architecture models. Perhaps diffusion beats out current LLMs, but from my armchair this seems unlikely.