Hacker News new | ask | show | jobs
by feigewalnuss 64 days ago
Disclosure: I wrote the linked post.

The "gave it my Gmail password" problem has a better answer than "don't do that." Security kicks itself out of the room when it only says no. Reserve the no for the worst days. The rest of the time, ship a better way.

That's why I built the platform to make credential leaks hard. It takes more than a single prompt. The credential vault is encrypted. Typed secret wrappers prevent accidental logging and serialization. Per-channel process isolation means a compromise in one adapter does not hand an attacker live sessions in the others.

"Don't do that" fails even for users trying their hardest. Good engineering makes mistakes hard and the right answer easy. Architecture carries the weight so the user does not have to.

On the trifecta being "sorta-kinda solved" by newer models, no. Model mitigations are a layer, not a substitute. Prompt injection has the shape of a confused-deputy problem and the answer to confused deputies has always been capabilities and isolation, not asking the already confused deputy to try harder.

You want the injection to fail EVEN when the model does not catch it.

1 comments

Thanks. Yeah, I skipped that part in my comment, there are solutions for a lot of this stuff.

The one I see the most is brokers. Agent talks to a thing, thing has credential and does the task for the agent. Or proxies that magically inject tokens.

I think this only works for credentials though?

It doesn't solve the personal information part (e.g. your actual emails), right?

As for security, my solution was: keep it simple and limit blast radius.

Expect it to blow things up, and set things up so it doesn't matter when it happens.

I don't like docker so I just made a Linux user called agent. Agent can blow up all the files in its own homedir, and cannot read mine.

I felt really clever until I realized there's an even better solution: just give it a laptop (or Mac mini, or server, or whatever we're doing this week).

Same result but less pain in my ass. Switching users is annoying (and sharing files, and permission issues...). Also, worrying about which user I'm running stuff as... The thing just shouldn't be on my machine in the first place. It should have its own!

Functionally, its own Linux user or root on a $3 VPS are the same thing. It blows up the VPS, I just reset it.

For keys, I don't do anything fancy. It can leak all my keys. But if anyone steals them, they can exhaust my entire $5 prepaid balance ;) Blast radius limited.

But yeah, needs, tastes and preferences may differ.

Right, we have to see credentials and personal data as different problems. Wirken addresses the first directly and only partially the second. Session scoping keeps injection damage inside one channel's scope so a poisoned email cannot reach into your Telegram credentials. The model still reads the email content during that session, and any prompt injection in that content can still act within what just that session can reach.

The layer that addresses content-level flow is information-flow enforcement above identity. TriOnyx (https://github.com/tri-onyx/tri-onyx) looks at that exact problem: taint and sensitivity tracking, gateway kills on threshold breach.

It complements Wirken. You need identity before you can meaningfully ask what agent A has been exposed to.

On the agent-gets-its-own-machine approach, that is fine as a blast-radius strategy and I have no quarrel with it. It trades isolation between channels for isolation between the agent and the host. If you only have one channel and disposable keys, it works. It stops working as soon as the agent holds something you cannot cheaply rotate, which for most people ends up being their messaging identities.

You mean like giving Claude your HN password? ;)