Hacker News new | ask | show | jobs
by lionkor 756 days ago
I found that, compared to GPT-3.5, it refuses to shut up when told to shut up.

In the middle of a conversation, try going "SHUT UP, STOP TALKING ALREADY". For me, it just keeps repeating the last output. Very cool.

2 comments

Yes: GPT-4 turbo could receive a meaningful correction and generally change its answer in that direction. GPT-4o is very, very resistant to doing this and will tend to parrot the previous answer, even after admitting it was in error.

I routinely fix this by toggling from GPT-4o to GPT-4t.

They still haven't fixed that? lmao

Having to constantly correct incorrect answers by LLMs only for them to apologize and give another incorrect answer is what made me lose complete interest in using them.

I figured if I'm knowledgable enough to correct LLMs it's more efficient to not use them at all. What's the point really? Am I teaching them? Because I felt like a teacher who is quizzing a student who keeps on guessing but failing.

I don't use LLMs to answer general problems for correctness. I use them for text formatting and rewriting superpowers. GPT-4t does a good job if I need it to iterate and change slightly what it does.

For example, to inform the University of California about the content of my courses, I have to go through a course articulation which is several pages long, is written in a formal academic voice, and is pretty time consuming to create. GPT-4t can take my informal course outline and an example of a past articulation that I've written and do the job to a point where I just need to ask it to make small changes for 10 minutes and then make a last couple edits myself. I turn a couple of hours to 10 minutes and 25 cents of API calls.

(Also, sometimes when it's explaining example assignments, it thinks of nice things to include that I hadn't planned on, and I end up shamelessly using them; other times it thinks of garbage and I have to coax it to articulate what I actually meant).

I'd say GPT-4o is slightly better at the task... except it commits so strongly to its answers in the context buffer that it doesn't do effective rewrites/corrections. So I've settled into a workflow of using GPT-4o to do initial work and then use GPT-4t for the final cleanup.

It feels like the answers are getting longer and longer too. Even for the most basic questions, which could be answered with 2 sentences. Does it have ADHD? Who wants to read all these wall of text?
Well it’s paid by token
Oh. Oooh! Yes. And "I don't know." aren't a lot of tokens. So where's the incentive there, lol.
But even the gpt3.5 answers were getting longer and longer. I don't know, I don't pay for it myself, we just have 4o at cagie and I don't know how that's different in terms of the tuning compared to the "normal" 4o