Hacker News new | ask | show | jobs
by mlyle 756 days ago
Yes: GPT-4 turbo could receive a meaningful correction and generally change its answer in that direction. GPT-4o is very, very resistant to doing this and will tend to parrot the previous answer, even after admitting it was in error.

I routinely fix this by toggling from GPT-4o to GPT-4t.

1 comments

They still haven't fixed that? lmao

Having to constantly correct incorrect answers by LLMs only for them to apologize and give another incorrect answer is what made me lose complete interest in using them.

I figured if I'm knowledgable enough to correct LLMs it's more efficient to not use them at all. What's the point really? Am I teaching them? Because I felt like a teacher who is quizzing a student who keeps on guessing but failing.

I don't use LLMs to answer general problems for correctness. I use them for text formatting and rewriting superpowers. GPT-4t does a good job if I need it to iterate and change slightly what it does.

For example, to inform the University of California about the content of my courses, I have to go through a course articulation which is several pages long, is written in a formal academic voice, and is pretty time consuming to create. GPT-4t can take my informal course outline and an example of a past articulation that I've written and do the job to a point where I just need to ask it to make small changes for 10 minutes and then make a last couple edits myself. I turn a couple of hours to 10 minutes and 25 cents of API calls.

(Also, sometimes when it's explaining example assignments, it thinks of nice things to include that I hadn't planned on, and I end up shamelessly using them; other times it thinks of garbage and I have to coax it to articulate what I actually meant).

I'd say GPT-4o is slightly better at the task... except it commits so strongly to its answers in the context buffer that it doesn't do effective rewrites/corrections. So I've settled into a workflow of using GPT-4o to do initial work and then use GPT-4t for the final cleanup.