|
> is their willingness to correct themselves when asked Except they don't correct themselves when asked. I'm sure we've all been there, many, many, many,many,many times .... - User: "This is wrong because X"
- AI: "You're absolutely right ! Here's a production-ready fixed answer"
- User: "No, that's wrong because Y"
- AI: "I apologise for frustrating you ! Here's a robust answer that works"
- User: "You idiot, you just put X back in there"
- and so continues the vicious circle....
|
With weak multi-turn instruction following, context data will often dominate over user instructions. Resulting in very "loopy" AI - and more sessions that are easier to restart from scratch than to "fix".
Gemini is notorious for underperforming at this, while Claude has relatively good performance. I expect that many models from lesser known providers would also have a multi-turn instruction following gap.