| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by davmre 50 days ago

This sounds more or less unavoidable? Decompilers are inherently security-sensitive. If you take avoiding cyberattack uplift seriously as a goal, I don't see how you get around essentially refusing to work on them.

Obviously there are plenty of innocuous applications too, but it's not like the people building decompilers for nefarious reasons will be explicit about it. The LLM abstraction just inherently doesn't have enough context to distinguish your intentions or your broader use cases. This is why both Anthropic and OpenAI have had to create side channel mechanisms for security researchers to establish a trusted use context. It sounds like this makes this not a viable product for you, unfortunately, and it makes sense that that's frustrating. But I also don't see what different behavior one could reasonably expect given the constraints.

If it's any consolation, these restrictions only make sense for models that are ahead of the open-weights frontier, so open-source hackers will presumably get Mythos-level capabilities in the relatively near future anyway.

2 comments

gck1 50 days ago

I'm not sure how the new guardrails work exactly, but I've read enough of reddit / Chinese communities focused on jailbreaking the models, to know that you either have to nerf it to the point where it fires even on "kill the task", or someone (maybe even other LLM) is going to come up with a set of tokens that is going to go around the defenses.

Nerfed models are really bad for PR, especially when you're staking your company's future on it being the smartest, most dangerous thing in the world.

So I believe they will ease up on nerfing/guardrails just enough that bad actors will find a way, while good ones will stay limited on anything dual-use. Just like such restrictions usually work in other places.

P.S. yes, "kill the task" did, in fact result in a refusal AND a warning on my claude account in Opus 4.8's early days.

zb3 50 days ago

> If you take avoiding cyberattack uplift seriously as a goal

This "uplift" risk obviously excludes the US. The goal of this is that the US bandits (like NSA) will find exploits and attack other countries (classic US behaviour), but these other countries can't be allowed to defend against these attacks. NSA/CIA thugs are "trusted", foreign defenders in sanctioned countries will of course be "untrusted".