| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by baq 2 days ago
	It is on a level above everything else for now, that’s enough to determine it’s quite literally in its own class. Anecdotally it is a good model, sir.

1 comments

cassianoleal 1 day ago

It doesn't seem to be on a level above everything else, no. It seems to be a step increase in some areas and maybe even a decrease in others.

Anectodally, DeepSeek V4 is a very good model as well, sir. I'm not calling anything V4-class because of that.

link

baq 1 day ago

I’ve been piloting frontier LLMs for as long as anyone outside of the labs and I just disagree. It is a tier above for some tasks (especially in my usage) and not a downgrade on anything I tried it on. This is enough for me to rank it higher; ymmv.

link

cassianoleal 1 day ago

Fair enough!

I've only briefly tried it and it did seem quite capable for what I was doing, but not that much better than the Chinese models I've been mostly using.

In any case, this [0] seems to paint a more reasonable picture than "it's much better than anything else at everything".

[0] https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...

link

khalic 1 day ago

Have you used it? It’s clearly a class above, I had it solve so many things in 3 days, it was ridiculous

link