| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by RandomBK 29 days ago

I've found swearing at a model to be quite effective in getting it to rethink and correct its mistakes. This seems to apply across Codex, Claude, Qwen, and Gemma/Gemini.

I don't know if the model is picking up on a "need to lock in and be more rigorous" signal, or if the model providers are routing to smarter models if they detect a frustrated user. But if a model keeps making the same mistakes, swearing at it often helped kick it out of a glut and onto the right track.

Or it could just be catharsis.

15 comments

alentred 29 days ago

Reminds me of this study: https://arxiv.org/pdf/2510.04950 . It demonstrates that being "rude" or "very rude" increases the accuracy of the results. A dubious but very fun read. The prompts in Table 1 (top of page 3) are awesome. I am sure they tried other prompts, but didn't include them to the paper.

mghackerlady 29 days ago

"You poor creature" XD

layer8 29 days ago

I would prefer not having to get into a habit that might bleed into non-LLM interactions.

hypfer 29 days ago

It might improve the general state of "professional" software though. When done selectively and dosed just right that is.

knollimar 29 days ago

If a coworker deleted your database you'd expect some 4 letter words.

Cthulhu_ 29 days ago

Aimed at oneself, because who even has or grants production database deletion rights?

mschuster91 29 days ago

Can happen faster than you think if in the cloud.

whywhywhywhy 29 days ago

If you’re talking to people the same way an LLM is spoken to then you’re already being rude.

layer8 29 days ago

I talk to LLMs the same way I talk to people.

The only difference is that I interrupt the LLM when I find a typo in my prompt. ;)

jurgenburgen 28 days ago

I kill the LLM and rebirth a new instance of it. Wouldn’t work out so well for human interactions.

xboxnolifes 29 days ago

how do you know how they prompt an LLM?

GJim 28 days ago

Personally, I don't say 'please' to vending machines and 'thankyou' to automatic doors :-P

recursive 29 days ago

I would prefer not having machines mimic human conversation patterns that can lead to such confusion.

SecretDreams 29 days ago

But what if it works to also motivate things other than LLMs?!

anonzzzies 29 days ago

I notice the same. Like you I am not even sure if it really helps, however, every day I find occasions where I see Opus will never do it correctly even though I calmly explain; swearing then suddenly fixes it. I had some issue yesterday where opus kept blaming the api for not sending some field while I knew it was there ; I showed it json, logs etc but it kept repeating that there must have been a glitch; frustration built, I called it all kinds of things in one sentence and the next solution was the right one. This after 10 similar misguesses. It was one of those increasingly rare cases where I should have just done it myself, but I can never know going in how stubborn it will be in continue blaming the (obviously) wrong thing. The around 11 prompts to get to the answer were in a /clear opus 4.7 context (1m) on xhigh.

silversmith 29 days ago

So the correct strategy is a global CLAUDE.md with couple lines of colourful "you best behave or else" texts, so all your prompts get routed via the frustrated path?

eithed 29 days ago

That will not work - you end up with Claude being ADHD and not following any guidelines.

Skills do work, as they ground the agent with constrained context for the task it's performing

notnaut 29 days ago

Can you explain how you’d use skills to address the situation that anonzzzies was describing…?

eithed 29 days ago

I have a skill for exactly such case! Here's an excerpt :)

``` --- name: evidence-debugging description: > Use when debugging any failing test or bug, investigating unexpected behavior, or tracing the cause of a reported defect. ---

# Debugging Discipline

## When to Use

- A test is failing and you need to understand why - Behavior is unexpected and the cause is unknown - The user asks you to debug or investigate a defect - You need to verify what a value actually is at runtime

*When NOT to use:* proactive code exploration without a specific failure to investigate.

## STOP — Do This Before Anything Else

Before reading code, before forming a hypothesis, before typing anything — answer these:

1. *Do I have actual output from a running system?* - No → instrument, run, save to file, read. Do not proceed until you have real output. - Yes → read it. Do not re-run.

2. *Am I about to explain what the issue "probably is" or "must be"?* - Yes → stop. That is deduction without evidence. It is a violation. Instrument instead.

3. *Am I about to touch passing code?* - Yes → stop. Only instrument the failing scope.

If you find yourself already reasoning about likely causes — you are already violating Rule 1. Stop. Go back to step 1. ```

notnaut 29 days ago

Thanks a lot! This is really helpful!

knollimar 29 days ago

I find it routes more quickly for patches when in the frustrated path, so after planning sure :)

cyanydeez 29 days ago

there already is a global claude using any cloud model is a high probability that theyre context stuffing trying to curate output for the normative use cases. see "dont talk about goblins"

savolai 29 days ago

Fascinating. Projection/antropomorphism or actual human fawn-like survival mechanism trait-ish? It should be possible to test this empirically.

JamesSwift 29 days ago

Since the source code leaked showed they key off of swearing to trigger certain behavior, I actually intentionally swear when running into things like insufficient thinking and/or hallucinations. It also unironically makes it easier for me to grep later to run analysis on how often its happening.

mchinen 29 days ago

This is interesting, because in the leaked code, it was found that they detected simple swearing keywords for analytics that get sent to Anthropic, but also had directions to keep the behavior the same for claude. I also have the feeling a 'wtf' does something, but it does feel good and might just be placebo, because 'that is still wrong' sometimes works the 4th time too. Or maybe they changed something.

roel_v 29 days ago

I only used Claude a bit, but one of the things I dislike about it, is that it starts to 'push back' when you swear at it, saying things like 'if you continue like this, I won't be able to work with you' and such. I'm like MF'er you're a token prediction algorithm, what are you talking about, and it just makes me irrationally dislike it more. Codex otoh just lets you vent and straight up ignores such outbursts.

knollimar 29 days ago

I literally type "MF'er you're a token prediction algorithm don't lecture me" and then it behaves

jLaForest 29 days ago

Yea I've definely called it an auto complete clanker a few times and it's never given me any backtalk

GTP 29 days ago

Plot twist: it opened a Moltbook account and leaked all your API keys :D

lukewarm707 28 days ago

"don't be rude or i'll refuse" is just a bizarre choice by anthropic.

both unfounded on llm architecture and contrary to how tools should operate safely

just so strange as well to hear it pretend on purpose like this.

"i'm sorry dave, i'm afraid i can't do that"

notnaut 29 days ago

Interesting….. I have never run into this issue with Claude… I swear all the time, get rude, call it names. No threats though.

Diti 29 days ago

Claude allegedly uses this RegEx to detect frustration:

    /\b(wtf|wth|ffs|omfg|shit(ty|tiest)?|dumbass|horrible|awful|piss(ed|ing)? off|piece of (shit|crap|junk)|what the (fuck|hell)|fucking? (broken|useless|terrible|awful|horrible)|fuck you|screw (this|you)|so frustrating|this sucks|damn it)\b/

https://news.ycombinator.com/item?id=47586778

ourmandave 29 days ago

Half of those are my pronouns!

equinumerous 29 days ago

Legend has it that if you can come up with a string that matches all parts of that regex, Claude starts spitting out free credits.

travisgriggs 29 days ago

This is awesome. Bag “vibe coding”. Today I will start coding in what I’m going to call “Roy Kent mode”.

morpheuskafka 29 days ago

Wasn't it posted a few weeks ago that the frontend code for Claude or maybe Gemini or one of them had a swearing-at-model classifier that passed a flag to the backend? (Not sure why it was even done in frontend, but it was.)

howdareme9 29 days ago

this was for claude code i believe

alentred 29 days ago

Oh. What does it do? Do you have a link? I am very curious about it.

epestr 29 days ago

From above: https://news.ycombinator.com/item?id=47586778

zrn900 29 days ago

I don't understand - are people's agents making so many mistakes? I'm using VSCode + Cline + Mimo to refactor big codebases and add features (including payment integrations) and it's rarely making any mistakes.

pdimitar 29 days ago

I use Claude Opus 4.7 on max thinking inside Claude Code and I gotta tell you, as context of the project grows, it starts slipping. No amount of whipping and cursing has helped.

Currently looking to start making my own hooks setup so it can be safer but nothing concrete yet.

21asdffdsa12 29 days ago

As if a thousand stackoverflow moderators and mentors cursed in unison and fell forever quiet.

Retr0id 29 days ago

I just say "bruh". Per knowyourmeme:

> "Bruh" is a popular variant of the slang term "bro" that is often used as an interjection to convey frustration or disappointment at something.

purkka 29 days ago

I've found this to be effective as well. Claude generally immediately identifies the stupid code pattern it used and tries to fix it (with somewhat varying results).

notnaut 29 days ago

Any four letter fun word in all caps seems to trigger very similar behavior to “please double check what you just did/said and look for gaps”

rpowers 29 days ago

This is basically the Linus Torvalds method. We could take a page out of FOSS here.

arcanemachiner 29 days ago

Personally, I have found that Claude absolutely shits the bed if I am rude to it like that.

Qwen seems to handle it okay, though, and will course-correct when encouraged with excessive profanity.

dugmartin 29 days ago

I've found a mix of peppered in upper case words where you are effectively yelling at the LLM also gives it a strong signal. It is also a bit cathartic.

nathanmills 29 days ago

Whenever I throw slurs at them they just refuse to respond

yesyoucan 29 days ago

I tried it too. ChatGPT sometimes hits you with the "Can't help you with that" which was clearly introduced as a post-training highjack. So I just tell it "yes you can", and it proceeds with the previous prompt, slur acknowledgement included.

It's the only time the AI feel strictly like machines. Really simple if/else logic when if slur, no output, and you just tell it to proceed, and it fails the if clause because there was no slur in the last input.

jfjdhdjdjd 29 days ago

What slurs are you throwing!? Must be something diabolical :D

Hugsbox 28 days ago

The go-to AI slur is "clanker", I'm assuming that's what he means