The act of making it state its reasoning can help it uncover mistakes. Note that I'm asking a second model to do this; not the original one, otherwise I would not expect a different result.
I would totally expect a different result even on the same model. Especially if you're doing this via a chat interface (vs API) where you can't control the temperature parameters.
But yes, it'll be more effective on a different model.
But yes, it'll be more effective on a different model.