|
|
|
|
|
by davmre
2 days ago
|
|
This sounds more or less unavoidable? Decompilers are inherently security-sensitive. If you take avoiding cyberattack uplift seriously as a goal, I don't see how you get around essentially refusing to work on them. Obviously there are plenty of innocuous applications too, but it's not like the people building decompilers for nefarious reasons will be explicit about it. The LLM abstraction just inherently doesn't have enough context to distinguish your intentions or your broader use cases. This is why both Anthropic and OpenAI have had to create side channel mechanisms for security researchers to establish a trusted use context. It sounds like this makes this not a viable product for you, unfortunately, and it makes sense that that's frustrating. But I also don't see what different behavior one could reasonably expect given the constraints. If it's any consolation, these restrictions only make sense for models that are ahead of the open-weights frontier, so open-source hackers will presumably get Mythos-level capabilities in the relatively near future anyway. |
|
Nerfed models are really bad for PR, especially when you're staking your company's future on it being the smartest, most dangerous thing in the world.
So I believe they will ease up on nerfing/guardrails just enough that bad actors will find a way, while good ones will stay limited on anything dual-use. Just like such restrictions usually work in other places.
P.S. yes, "kill the task" did, in fact result in a refusal AND a warning on my claude account in Opus 4.8's early days.