Hacker News new | ask | show | jobs
by baq 2 days ago
It is on a level above everything else for now, that’s enough to determine it’s quite literally in its own class. Anecdotally it is a good model, sir.
1 comments

It doesn't seem to be on a level above everything else, no. It seems to be a step increase in some areas and maybe even a decrease in others.

Anectodally, DeepSeek V4 is a very good model as well, sir. I'm not calling anything V4-class because of that.

I’ve been piloting frontier LLMs for as long as anyone outside of the labs and I just disagree. It is a tier above for some tasks (especially in my usage) and not a downgrade on anything I tried it on. This is enough for me to rank it higher; ymmv.
Fair enough!

I've only briefly tried it and it did seem quite capable for what I was doing, but not that much better than the Chinese models I've been mostly using.

In any case, this [0] seems to paint a more reasonable picture than "it's much better than anything else at everything".

[0] https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...

Have you used it? It’s clearly a class above, I had it solve so many things in 3 days, it was ridiculous