Hacker News new | ask | show | jobs
by cassianoleal 5 hours ago
How do you qualify what makes a model "Mythos class", and how do you reliably test for it?
1 comments

Presumably a deepswe benchmark, which IIRC puts GLM 5.2 between opus 4.8 and fable.