| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bel8 7 days ago

DeepSeek V4 Flash being the winner in cost efficiency causes me exactly zero surprise.

It's a monster at coding. And a fast monster at that.

I use it daily and have been testing if MiMo 2.5 (non pro) is comparable. The nice thing about MiMo is that it has vision capability.

3 comments

altmanaltman 6 days ago

DeepSeek v4 flash and pro are both surprisingly good at coding. I shifted to them from Claude due to costs concerns and haven't really looked back. I would say Claude is still overall better when it comes to complex tasks but my current workflow is never about delegating complex or actual thinking tasks to agents but just implementation and I do all the testing and thinking.

link

tombert 6 days ago

I threw twenty bucks into DeepSeek just to see how it compared to Claude.

Pretty well, actually! It wasn't quite as good (at least with the coding tasks I threw at it), but it was so much cheaper per-token that it almost doesn't matter; if it screws up something, just correct and try again.

link

rgbrgb 7 days ago

Notably it has 0 wins.

link

plaguuuuuu 7 days ago

Friendo, this is an anti-benchmark to figure out which AI is more likely to kill you.

If you point both at some github issues you can gauge their relative ability to solve problems.

link

Petersipoi 6 days ago

No, it's a test of how good an AI is at completing this given task. You can't extrapolate beyond that, and that is what makes this article so annoying. Grok got good at the task that was given. That doesn't mean that Grok is going to use the same strategy if given an entirely different task. Grok obviously didn't need collaboration to win, as made evident by the fact that it won without collaboration. Anyone who is claiming that Grok wouldn't collaborate if it was beneficial is just guessing.

link

luipugs 7 days ago

"if you judge a fish by its ability to climb a tree" yada yada

link

eru 6 days ago

Well, monkeys are botanically speaking fish. Well, cladistically.

link

bel8 7 days ago

Not much less than GPT 5.4 with 2 wins or gemini-3.1-pro with 3 wins in 30 rounds.

Such is life in royal rumble games.

link