Hacker News new | ask | show | jobs
by loudmax 60 days ago
Let's say we take Anthropic's security and alignment claims at face value, and they have models that are really good at uncovering bugs and exploiting software.

What should Anthropic do in this case?

Anthropic could immediately make these models widely available. The vast majority of their users just want develop non-malicious software. But some non-zero portion of users will absolutely use these models to find exploits and develop ransomware and so on. Making the models widely available forces everyone developing software (eg, whatever browser and OS you're using to read HN right now) into a race where they have to find and fix all their bugs before malicious actors do.

Or Anthropic could slow roll their models. Gatekeep Mythos to select users like the Linux Foundation and so on, and nerf Opus so it does a bunch of checks to make it slightly more difficult to have it automatically generate exploits. Obviously, they can't entirely stop people from finding bugs, but they can introduce some speedbumps to dissuade marginal hackers. Theoretically, this gives maintainers some breathing space to fix outstanding bugs before the floodgates open.

In the longer run, Anthropic won't be able to hold back these capabilities because other companies will develop and release models that are more powerful than Opus and Mythos. This is just about buying time for maintainers.

I don't know that the slow release model is the right thing to do. It might be better if the world suffers through some short term pain of hacking and ransomware while everyone adjusts to the new capabilities. But I wouldn't take that approach for granted, and if I were in Anthropic's position I'd be very careful about about opening the floodgate.

2 comments

Couldn't we use domain records to verify that a website is our own for example with the TXT value provided by Anthropic?

Google does the same thing for verifying that a website is your own. Security checks by the model would only kick off if you're engaging in a property that you've validated.

Or they could check if the source is open source and available on the internet, and if yes refuse to analyse it if the person who request the analysis isn't affiliated to the project.

That will still leave closed source software vulnerable, but I suspect it is somewhat rare for hackers to have the source of the thing they are targeting, when it is closed source.

How can they tell if the software is closed or open source?

They would have to maintain a server side hashmap of every open source file in existence

And it'd be trivial to spoof. Just change a few lines and now it doesn't know if it's closed or open

Of course just having the hash of the file wouldn't work, they would have to do something more complicated, a kind of perceptual hash. It's not easy, but I think it is doable.

But then I suspect lots of parts in a closed source project are similar to open source code, so you can't just refuse to analyze any code that contains open source parts, and an attacker could put a few open source files into "fake" closed source code, and presumably the llm would not flag them because the ratio open/closed source code is good. But that would raise the costs for attackers.