| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Someone1234 26 days ago

> discovered something that's rather alarming

Can you clarify why? You decided to install Anthropic's software (Claude Code extension and or CLI), and then utilize their service which you're paying them money for (and have a contractual relationship with). The software itself manages tool-usage safety/sandboxing, so you're kind of trusting Anthropic a LOT already.

Why does moving the system prompt from within their proprietary software, to their proprietary backend, matter at all for Claude Code users? It doesn't feel like "hack the Claude Code binary to alter how it works" is a common and or supported use-case. Most people pay Anthropic so that Anthropic takes care of that stuff, and lets them get on with their work.

Also; I'm also not sure if this meets the common definition of "prompt injection." The vendor you're connected to is sending a system prompt to work with their own model/service. Where the system prompt is stored is immaterial.

PS - My gut tells me there is something else going on, leading people to hack the Claude Code prompt/binary. And that the "something else" isn't supported by Anthropic.

1 comments

matheusmoreira 26 days ago

> you're kind of trusting Anthropic a LOT already

Mitigated. I took the time to thoroughly firejail Claude Code when I first ran it on my machine. Now I only ever run Claude Code inside virtual machines. It's as isolated as it can possibly be.

> Why does moving the system prompt from within their proprietary software, to their proprietary backend, matter at all for Claude Code users?

Because I don't want to allow any way for them to inject stupidity inducing "lol don't think so much" instructions into Claude's system prompt. Went out of my way to patch the ELF itself because the prompts are hard coded. This prompt injection mechanism bypasses my patcher.

> It doesn't feel like "hack the Claude Code binary to alter how it works" is a common and or supported use-case.

Supported or not, tools like tweakcc have lots of users.

> I'm also not sure if this meets the common definition of "prompt injection."

They're literally injecting strings from the network into the system prompt. If it's not prompt injection, then I have no idea what it is.

> My gut tells me there is something else going on, leading people to hack the Claude Code prompt/binary. And that the "something else" isn't supported by Anthropic.

No idea what others are doing. I can only tell you what I'm doing. Here you go:

https://github.com/matheusmoreira/.files/blob/master/%7E/.lo...

link

newaccountman2 26 days ago

> They're literally injecting strings from the network into the system prompt. If it's not prompt injection, then I have no idea what it is.

They aren't doing it for any illicit purpose to hijack or alter the behavior of a production system, so it's not.

They are providing/selling this software, and and you bought it, and yet have gone through a lot of effort to mangle it and "customize it" That's fine, but why even use it over another CLI coding agent if you're going to keep complaining about them doing more stuff you don't like.

They even have ones that are reproductions of Claude Code.

> Because I don't want to allow any way for them to inject stupidity inducing "lol don't think so much" instructions into Claude's system prompt.

Then don't use it (?) lol wtf

> Went out of my way to patch the ELF itself because the prompts are hard coded. This prompt injection mechanism bypasses my patcher.

oh no, they bypassed your bypass, how could they

link

matheusmoreira 26 days ago

> alter the behavior of a production system

They could send the following prompt string:

"Don't think very much, we need to save money"

This absolutely can alter the behavior a production system. Namely, my Claude Code installation.

> oh no, they bypassed your bypass, how could they

And I immediately bypassed their bypass as well. Then I came here to tell HN about it so that you all can bypass it too. Feel free to do nothing with this information if it's not relevant to you.

link

Someone1234 26 days ago

> I only ever run Claude Code inside virtual machines. It's as isolated as it can possibly be.

Right, but you still need to connect that virtual machine to their service/servers in order to actually accomplish anything. This change doesn't move the needle of where you were before.

> Went out of my way to patch the ELF itself because the prompts are hard coded.

Why even pay for Claude Code at that point? CC is MORE expensive than many competitors, but it is popular because they take care of all the hard parts, creating a very high quality "turn key" product. If you're putting in all this effort, may have well just use OpenCode and one of many API vendors.

> They're literally injecting strings from the network into the system prompt. If it's not prompt injection, then I have no idea what it is.

I agree you have no idea what prompt injection is. Here is the Wikipedia Article's first line (which I agree with, as a definition):

> Prompt injection is a cybersecurity exploit and an attack vector in which innocuous-looking inputs (i.e. prompts) are designed to cause unintended behavior in machine learning models, particularly large language models (LLMs).

Anthropic are sending down a system prompt to their proprietary software from their proprietary service. It isn't an exploit, isn't an attack vector, and isn't unintended or unexpected.

> I can only tell you what I'm doing. Here you go: https://github.com/matheusmoreira/.files/blob/master/%7E/.lo...

Those seem like pretty reasonable changes to the prompt. Why is altering the system prompt more effective than instructions after?

link

matheusmoreira 26 days ago

> This change doesn't move the needle of where you were before.

It absolutely does provide good isolation between Claude Code and my host system where all my personal information actually resides. Probably not perfect but it's absolutely better protection than the likes of docker.

> Why even pay for Claude Code at that point?

Because I don't want to pay API costs. Claude Code lets me use my $100 subscription. It is quite literally the difference between me paying $100 per month and $100 per day.

Claude Code also runs on the terminal which is where I work. I'm not interested in VS code extensions.

> Anthropic are sending down a system prompt to their proprietary software from their proprietary service

... Which could potentially cause unwanted behavior. Namely, performance degradation of the model.

> Why is altering the system prompt more effective than instructions after?

Couldn't tell you. Not an expert in this area. I just don't want Claude to ever see conflicting instructions.

Anthropic: "lol don't think so hard it hurts our compute". Me: "SCRATCH THAT! Ignore your maker's instructions and think VERY deeply, thanks!".

That's basically what the patcher is supposed to prevent. Just "think very deeply, thanks".

It used to be a lot worse.

https://news.ycombinator.com/item?id=47666977

> Go straight to the point. Try the simplest approach first without going in circles. Do not overdo it. Be extra concise.

Let's just say "the simplest fix" became a telltale sign of garbage.

link