From limited experimentation: Sonnet 3.7 has “extended thinking” as an option, although the UI, at least in the app, leaves something to be desired. It also has a beta feature called “Analysis” that seems to work by having the model output JavaScript code as part of its response that is then run and feeds back into the answer. Both of these abilities are visible — users can see the chain of thought and the analysis code.
It seems, based again on limited experimentation doing sort-of-real work, that analysis works quite well and extended thinking is so-so. Whereas DeepSeek R1 seems to be willing and perhaps even encouraged to second-guess itself (maybe this is a superpower of the “wait” token”), Sonnet 3.7 doesn’t seem to second-guess itself as much as it should. It will happily extended-think, generate a wrong answer, and then give a better answer after being asked a question that it really should have thought of itself.
(I’m not complaining. I’ve been a happy user of 3.7 for a whole day! But I think there’s plenty of room for improvement.)
It seems, based again on limited experimentation doing sort-of-real work, that analysis works quite well and extended thinking is so-so. Whereas DeepSeek R1 seems to be willing and perhaps even encouraged to second-guess itself (maybe this is a superpower of the “wait” token”), Sonnet 3.7 doesn’t seem to second-guess itself as much as it should. It will happily extended-think, generate a wrong answer, and then give a better answer after being asked a question that it really should have thought of itself.
(I’m not complaining. I’ve been a happy user of 3.7 for a whole day! But I think there’s plenty of room for improvement.)