Hacker News new | ask | show | jobs
by megalomanu 369 days ago
We just tested magistral-medium as a replacement for o4-mini in a user-facing feature that relies on JSON generation, where speed is critical. Depending on the complexity of the JSON, o4-mini runs ranged from 50 to 70 seconds. In our initial tests, Mistral returned results in 34–37 seconds. The output quality was slightly lower but still remain acceptable for us. We’ll continue testing, but the early results are promising. I'm glad to see Mistral prioritizing speed over raw power, there’s definitely a need for that.
2 comments

I bet you can close the gap with a finetune.

Should be quiet easy if you have some o4-mini results sitting around.

I am curious why you would choose a reasoning model for JSON generation?

I was recently working on a user facing feature using self-hosted Gemma 27b with VLLM and was getting fully formed JSON results in ~7 seconds (even that I would like to optimize further) - obviously the size of the JSON is important but I’d never use a reasoning model for this because they’re constantly circling and just wasting compute.

I haven’t really found a super convincing use-case for reasoning models yet, other than a chat style interface or an assistant to bounce ideas off of.

It is for generating a big nested JSON, quite complex from a business standpoint (lots of different business concepts). We didn't have good results with simple models.