| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pietz 756 days ago

I haven't noticed that GPT-4o hallucinates a lot more than the previous version but I noticed 2 other things of which especially the latter seems relevant here.

1) it's insanely chatty, to a point where it ignores instructions about not doing certain things. I think this behavior is heavily favoured by benchmarks but as somehow who expects concise answers, this model annoys me. Custom instructions don't fully fix this for me.

2) It likes repetitive answers a lot more than the previous version. Meaning that it will try its hardest to generate the followup answer in the same format as the first one. I think this is also the problem in your example.

To my understanding, this is a measure against laziness, where the model would exclude information from the first answer that haven't changed in the followup. I always liked this behavior but maybe you remember the time from a few months ago where many people complained about the laziness of (I believe) 0125.

Btw, while I type this, I notice that this is probably the highest level of first world problems I've ever complained about. There is this amazing almost free tool that answers all my questions and does most of my coding and I dislike it because it provides me with thorough context.

3 comments

me_vinayakakv 756 days ago

I am using GPT 4o and have observed both 1 and 2.

Based on a tweet in X[1], I had to add "I REPEAT" to the instruction to get the model to not to ignore instructions.

[1]: https://x.com/btibor91/status/1796077902959640893

link

emsign 756 days ago

Given that this is highly inefficient from a user's point of view and every request costs energy and someone else's money, I wouldn't call it a first world problem but a design flaw with more serious implications than just repercussions for annoyed customers. It stacks up.

link

laborcontract 756 days ago

The instruction ignoring piece is noticeable to the extent that it sometimes reminds me of 3.5-turbo. I just wonder if it’s a side effect of their training, or whatever they did to make the model more efficient.

link