Hacker News new | ask | show | jobs
by postalcoder 58 days ago
This release Mistral really reminds you of the gap between the frontier labs and everyone else.

Pre-agent, there wasn't always an obvious difference between models. Various models had their charms. Nowadays, I don't want to entertain anything less than the frontier models. The difference in capability is enormous and choosing anything less has a real cost in terms of productivity.

I've been a big fan of the smaller labs like Mistral and especially Cohere but it's been a while since I've been excited by a release by either company.

That said, I'm using mistral voxtral realtime daily – it's great.

3 comments

Can't agree at all. Productivity gap just 1 year ago was much larger for frontier model vs non-frontier. Let alone 2 years ago.
Same. The gap is almost paper thin for anyone who hasn't gone full uninformed vibe code.
When I was thinking pre-agentic, I was actually thinking more pre-"coding seen as the main use case for these models".
Coding has always been the main real-world business usecase since day one. There has been no point since the very first public availability of GPT 3.5 in November 2022, that it wasn't.

A lot of us have been agentic coding since almost 2 years ago, mid-2024. I have. The productivity gap of "best vs 2nd vs 3rd best model" was biggest back then and has slowly been shrinking ever since.

> Pre-agent, there wasn't always an obvious difference between models. Various models had their charms. Nowadays, I don't want to entertain anything less than the frontier models. The difference in capability is enormous and choosing anything less has a real cost in terms of productivity.

It's just apples to oranges.

There is not a clear, across the board, winner on non-agentic tasks between Gemini, ChatGPT, and Claude - the simple chatbot interface.

But Claude Code is substantially better than Codex which itself is notably better than Gemini-cli.

In this vein, it should not be surprising that Claude Code is way better than non-frontier models for agentic coding... It's substantially better than other frontier models at specialized agentic tasks.

I’ve been comparing Claude Code and Codex extensively side by side over the past couple of weeks with my favorite prompting framework superpowers…

From my perspective, Claude Code is decidedly not better than Codex. They’re slightly different and work better together. I would have no issues dropping CC entirely and using codex 100%.

If you’re working off of “defaults”, in other words no custom prompting, Claude Code does perform a lot better out of the box. I think this matters, but if you’re a professional software developer, I’d make the case that you should be owning your tools and moving beyond the baked in prompts.

CC is not better than Codex, nor is it better than OpenCode, Crush, Pi etc…
I think there's a fair amount of evidence that the heavy harnesses actually drag down performance compared to bare harnesses.
> Pre-agent, there wasn't always an obvious difference between models. Various models had their charms. Nowadays, I don't want to entertain anything less than the frontier models.

This is a very naive and misguided opinion. In most tasks, including complex coding tasks, you can hardly tell the difference between a frontier model and something like GPT4.1. You need to really focus on areas such as context window, tool calling and specific aspects of reasoning steps to start noticing differences. To make matters worse, frontier models are taking a brute force approach to results which ends up making them far more expensive to run, both in terms of what shows up on your invoice and how much more you have to wait to get any resemblance of output.

And I won't even go into the topic or local models.

> You need to really focus on areas such as context window, tool calling and specific aspects of reasoning steps to start noticing differences.

This is like saying "the current models and the old models are the same if you ignore every important advance they've made"

> This is like saying "the current models and the old models are the same if you ignore every important advance they've made"

Please go ahead and list the single most important advance a frontier model has over, say, gpt4.1.

Reasoning is one of the main features, and in practice all it does is waste compute to rewrite your original prompt. See how GPT 5.4 burns through compute by running additional prompts where it acts like your own interpreter, with lengthy reasoning prompts on "the user is asking for (...)" as if you are completely unable to stitch together a usable prompt. That's your frontier model.