| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ACCount37 302 days ago

1-turn instruction following and multi-turn instruction following are not the same exact capability, and some AIs only "get good" at the former. 1-turn gets more training attention - because it's more noticeable, in casual use and benchmarks both, and also easier to train for.

With weak multi-turn instruction following, context data will often dominate over user instructions. Resulting in very "loopy" AI - and more sessions that are easier to restart from scratch than to "fix".

Gemini is notorious for underperforming at this, while Claude has relatively good performance. I expect that many models from lesser known providers would also have a multi-turn instruction following gap.

2 comments

vidarh 302 days ago

This is a good point, and to drive this home to people, if you have a conversation of this pattern:

    User: Fix this problem ...
    Assistant: X
    User: No, don't do X
    Assistant: Y
    User: No, Y is wrong too.
    Assistant: X

It is generally pointless to continue. You now have a context that is full of the assistant explaining to you and itself why X and Y are the right answers, and much less context of you explaining why it is wrong.

If you reach that state, start over, and constrain your initial request to exclude X and Y. If it brings up either again, start over, and constrain your request further.

If the model is bad at handling multiple turns without getting into a loop, telling it that it is wrong is not generally going to achieve anything, but starting over with better instructions often will.

I see so many people get stuck "arguing" with a model over this, getting more and more frustrated as the model keeps repeating variations of the broken answer, without realising they're filling the context with arguments from the model for why the broken answer is right.

recursive 302 days ago

This is also a thing that's bad about LLMs. You're holding it wrong if you continue to argue. But LLMs are presented as if we can use the conventions of natural language to communicate with them. That's how they're sold. So if they fail to live up to those expectations, that's still a problem with LLMs.

vidarh 302 days ago

It's a problem with LLM's and people are "holding it wrong".

It makes zero difference that they've been sold as doing better if other people learn how to use them effectively and I choose to ignore how to get the best possible results out of them.

asadotzler 302 days ago

Except that it's impossible to "hold it right" -- even when following the guidance from its makers.

vidarh 301 days ago

I have no problem "holding it right". Just today I had AI write 100% of the code for two different tools, using an AI assistant which wrote all the code for itself after the initial ~100 lines.

It's not hard to learn to be productive with these models.

xienze 302 days ago

> I see so many people get stuck "arguing" with a model over this, getting more and more frustrated as the model keeps repeating variations of the broken answer

Maybe because people expect AI systems that are touted as all-knowing, all-powerful, coming-for-your-job to be smart enough to remember what was said two turns ago?

vidarh 302 days ago

That's fine once or twice. At that point people should learn that this isn't how they work, and figure out how to use them better.

It's not a tools fault if people insist on continuing to use them in counter-productive ways.

asadotzler 302 days ago

It's not the tools fault when people RTFM (guidance from the tool maker) and use it as it's intended (again, by the tool maker, who presumably knows how it works and is in the best position to guide users).

"If you keep pressing the back button like the IE engineers told you to, of course you will fail to go back. To go back you want to press the forward button. Are you an idiot? Press the forward button to go back, at least until the next version release when you will need to press the reload button to go back. Trust me, eventually the back button will go back, but for now only fools press the back button to go back."

vidarh 301 days ago

No, it's not the tools fault if you continue to use it in ways that according to you, yourself does not work, despite the availability of better guidance.

Do you always insist on listening to guidance you've observed doesn't work?

It sounds immensely counter-productive.

Meanwhile I'll continue to have AI tools write the majority of my code at this point.

xienze 302 days ago

They’re non-deterministic, remember? So it’s not always the case that an LLM will get stuck in this sort of loop. Hence why people get frustrated when it happens and continue to think that perhaps it should be working on a more consistent basis.

vidarh 301 days ago

So are people.

It is no more productive to continue to go in circles with an argumentative person who refuses to see reason.

If someone haven't learnt that lesson, they will get poor results at a whole lot more things in life than talking to AI.

psadauskas 302 days ago

There's also the Pink Elephant Paradox (Whatever you do, DO NOT think about a pink elephant).

If you mention X or Y, even if they're preceded by "DO NOT" in all caps, an LLM will still end up with both X and Y into its context, making it more likely it gets used.

I'm running out of ways to tell the assistant to not use mocks for tests, it really really wants to use them.

vidarh 302 days ago

I think in some cases you "just" need to instead up temperature to increase the variety of responses, repeat requests, and use hooks to automatically review and reject bad options.

(And yes, it's a horrible workaround)

_flux 302 days ago

Indeed, arguing with LLM is good if you like arguing. For results it's not the way to go.

I think often it's not required to completely start over: just identify the part where it goes off the rails, and modify your prompt just before that point. But yeah, basically the same process.

vidarh 302 days ago

Sure, when working with tools - like Copilot for example - that lets you "restore" the conversation to a given point, that has pretty much the same effect. The key is to excise the "bad steps" from the conversation and figure out how to amend the next conversation steps so it doesn't veer off in the wrong direction.

dingnuts 302 days ago

I don't know why this could be the case but I have absolutely gotten better results out of the bot after insulting it.

QuantumGood 302 days ago

I always ask "Tell me what you think it is I am asking" before asking for a solution. Improves the solution and context.

rtkwe 302 days ago

I have this problem all the time with minor image edits on ChatGPT the few time's I've tried it. Any time I try to do a second edit or change to the generated image it seems to take the already degraded output from it's first attempt and use that instead of the original image.