GLM 5.1 gets close to 4.6. It can happily run for hours and achieve a result. It given it bugs like a race condition that lead to a count being out by 1 after millions of operations, somewhere in a hundred thousand lines of C code littered with locks and atomic swaps, and it found (as did Opus). Most other models can't.
I'm using Fable now and GLM 5.1 doesn't really compare. But it's literally 1/20 the price. I can't use Fable for coding - it's too expensive. So now we have three levels of models - lightweight ones you dispatch en masse to find things, ones capable of agentic coding tasks that can run for hours like Opus, and GLM (and possibly open source ones - I've only tried a few), and now Fable, which is a truly helpful "architecture buddy". Fable still makes many, many, mistakes, so you have to review every word it writes.
I'm using Fable now and GLM 5.1 doesn't really compare. But it's literally 1/20 the price. I can't use Fable for coding - it's too expensive. So now we have three levels of models - lightweight ones you dispatch en masse to find things, ones capable of agentic coding tasks that can run for hours like Opus, and GLM (and possibly open source ones - I've only tried a few), and now Fable, which is a truly helpful "architecture buddy". Fable still makes many, many, mistakes, so you have to review every word it writes.