Hacker News new | ask | show | jobs
by hodgehog11 1 hour ago
Hard disagree. Opus reports to me like a student. Fable reported to me like a colleague (researcher). It genuinely seemed to pick up on nuance that the other models just don't, even when I tell them explicitly. It's been really frustrating that neither Codex nor Opus can make targetted edits to Fable's code without screwing something subtle up. For context, this is for computational geometry work, so your mileage may vary.
5 comments

Fable happened to be released after I had been experimenting with Claude Code for roughly two weeks. I had been trying to use Sonnet, and when I switched to Opus it was night and day. My understanding of geometry was maybe not as good as it should've been, and I kept seeing Sonnet say things I knew were wrong but didn't know enough about 6DOF camera positioning to ask it to fix. I finally asked the right questions, it couldn't answer them at all, I switched to Opus, it was night and day. But! Opus still couldn't really keep 6DOF "in its head." When I left it to its own devices it tended to come back having forgotten that it needed to keep 6 degrees of freedom in its head and collapsed the problem down to 3DOF or just a single angle.

Fable just understood what I was talking about and never needed me to stop it and say "you forgot this thing we talked about." The difference in spatial reasoning capability between the three models is very very palpable. I am curious to get more time with it because ultimately I feel like I sandbagged it by giving it problems that would've been within Opus' abilities, but required a lot more handholding.

> It's been really frustrating that neither Codex nor Opus can make targetted edits to Fable's code without screwing something subtle up.

Reminds me of the old adage: don't try to be too smart when writing code. Otherwise, dumber people - including your future self - will have trouble working with it.

Some problems are very hard to solve with stupid code. This can easily be the case (computational geometry)
Yes, in my project I made so much more progress in 3 days of Fable that is not comparable to how Opus is working.
To be fair, labs silently nerf models all the time.

Fable's probably objectively better at full power. I mean, I definitely felt the same difference in competency between Fable and current Opus. But Opus itself has definitely been nerfed, and Fable, even if it comes back the public forever (probably won't), will get nerfed.

I remember a time where a product didn't suddenly get worse while you were blinking.

That was a nice time. Let us get back to that time. Use open weights models. Own stuff.

Maybe I was getting downgraded to Opus 4.8 but I saw nothing even close to resembling this behavior when using Fable.
Wait, so..

This is interesting. The "reported to me like a colleague" part.

Is it just that anthropic gave Mythos even more of that Anthropic™ character, (incorrectly) radiating confidence?

Is that why people have been losing their minds over that thing? Is this just cheap social engineering?

I mean I bet it is also slightly more capable than opus, but that would all check out to me. Man.

Thanks for sharing I suppose.

the primary difference i noticed is that fable didnt try to check in every minute

to an extent that might have done it, but i had been playkng around ahead of time trying to reverse engineer my ray bans case so i can make my own plastic insert, and fable to opus' work from mostly broken to mostly done, and then when fable went away, opus broke it again

No, it’s just a fundamentally much better model. Going back to Opus feels like the model has been lobotomized. It makes much more frequent errors, especially of the “I claimed I tested x y and z, but actually only kinda half heartedly tested x, and assumed I understood what was wrong” variety.
Wait but that has been the exact word-for-word complaint when comparing sonnet to opus

Or opus to opus

Or really any new thing to old thing

When the agent is becoming more accurate and thorough what would you expect to be reported?
Oh I am sure that it became somewhat more accurate, and with that, the labeling there is in fact technically correct. It just does not work as an explainer for the doomsday-ish hype that model has induced in a lot of people's brains.

The user here is right in what they said but wrong in why they said it, essentially.

An analogy I keep coming back to with the current progress in LLMs is the progress in the 90s of 3D game engines.

Every upgrade made what came before it appear awful in comparison, to such an extent that every upgrade was called "photorealistic" and people kept forgetting that they'd been using that description for the previous engines that they were now dismissing.

https://archive.org/details/nextgen-issue-26

That’s a rather bad faith framing, I think. Who are you to judge why I said something?