Hacker News new | ask | show | jobs
by dmos62 415 days ago
If the benchmarks aren't lying, Mercury Coder Small is as smart as 4o mini and costs the same, but is order of magnitude faster when outputting (unclear if pre-output delay is notably different). Pretty cool. However, I'm under the impression that 4o-mini was superceded by 4.1-mini and 4.1-nano for all use cases (correct me if I'm wrong). Unfortunately they didn't publish comparisons with the 4.1 line, which feels like an attempt to manipulate the optics. Or am I misreading this?

Btw, why call it "coder"? 4o-mini level of intelligence is for extracting structured data and basic summaries, definitely not for coding.

1 comments

It appears to be purpose-trained for coding. They also have a generalist model, but that's not the one being compared.

I agree, the comparison is dated, cherry-picked and doesn't reference the thinking models people do use for coding.

But it's also a bit of a new architecture in early stages of development/testing. Comparing against other small non-thinking models is a good step. It demonstrates the strategy is viable and worth exploring. Time will tell its value. Perhaps a guiding LLM could lean on diffusion to speed up generation. Perhaps we'll see more mixed-architecture models. Perhaps diffusion beats out current LLMs, but from my armchair this seems unlikely.