Yes: GPT-4 turbo could receive a meaningful correction and generally change its answer in that direction. GPT-4o is very, very resistant to doing this and will tend to parrot the previous answer, even after admitting it was in error.
I routinely fix this by toggling from GPT-4o to GPT-4t.
Having to constantly correct incorrect answers by LLMs only for them to apologize and give another incorrect answer is what made me lose complete interest in using them.
I figured if I'm knowledgable enough to correct LLMs it's more efficient to not use them at all. What's the point really? Am I teaching them? Because I felt like a teacher who is quizzing a student who keeps on guessing but failing.
I don't use LLMs to answer general problems for correctness. I use them for text formatting and rewriting superpowers. GPT-4t does a good job if I need it to iterate and change slightly what it does.
For example, to inform the University of California about the content of my courses, I have to go through a course articulation which is several pages long, is written in a formal academic voice, and is pretty time consuming to create. GPT-4t can take my informal course outline and an example of a past articulation that I've written and do the job to a point where I just need to ask it to make small changes for 10 minutes and then make a last couple edits myself. I turn a couple of hours to 10 minutes and 25 cents of API calls.
(Also, sometimes when it's explaining example assignments, it thinks of nice things to include that I hadn't planned on, and I end up shamelessly using them; other times it thinks of garbage and I have to coax it to articulate what I actually meant).
I'd say GPT-4o is slightly better at the task... except it commits so strongly to its answers in the context buffer that it doesn't do effective rewrites/corrections. So I've settled into a workflow of using GPT-4o to do initial work and then use GPT-4t for the final cleanup.
It feels like the answers are getting longer and longer too. Even for the most basic questions, which could be answered with 2 sentences. Does it have ADHD? Who wants to read all these wall of text?
But even the gpt3.5 answers were getting longer and longer. I don't know, I don't pay for it myself, we just have 4o at cagie and I don't know how that's different in terms of the tuning compared to the "normal" 4o
I routinely fix this by toggling from GPT-4o to GPT-4t.