| This matches my experience. Burned $2K to see how it will perform on frontend tasks and backend tasks. Frontend did a significantly better job than Opus on toy-scale wireframe projects by using gimmicks like fluid dynamics. Then when given medium to big tasks like multi-page web app where layouts and aesthetics must be decided by model itself, results by Fable and Opus scored indistinguishable score from human judges. Backend, gave tasks related to setting up a data flow that involves Postgres, R2, Kubernetes, gVisor, so on. The noticeable gap was, Opus did better than Sonnet, but Fable actually returned a result that fails and confidently stated it ran X, Y, Z tests to ensure it works and got these results. Very surprising, given neither Opus nor Sonnet suffered such problem. Longest frontend task was ~2H. Backend, 8H. Though none of the tasks were related to developing LLMs, (just production grade secure system that could've been developed 20 years ago, no LLMs involved), it is possible Claude Fable downgraded itself or spitted out fake results. There'd be no way of knowing since Anthropic silently degrades model quality based on undisclosed internal criteria which claims to be about LLMs. We decided Fable is unpredictable and cannot be trusted to the degree that Opus and Sonnet can be trusted for any projects beyond toy-scale quick wireframes, but Fable can be the best tool for quick UI UX wireframing for non-technical roles. |
For context, my Claude Code working style is quite heavy on discussion "to align" before implementing anything. We also use a good amount of Markdowns.
Oh yeah, it also is has way less "phrasing quirks" and is a clearer communicator. Opus 4.8 was a bit of loon with some of its writing styles. I had mostly straightened it out, but not entirely. It would use the most ridiculous flair at times.