| Hi, I'm the CISO from Anthropic. Thank you for the criticism, any feedback is a gift. We have laid out in our RSP what we consider the next milestone of significant harms that we're are testing for (what we call ASL-3): https://anthropic.com/responsible-scaling-policy (PDF); this includes bioweapons assessment and cybersecurity. As someone thinking night and day about security, I think the next major area of concern is going to be offensive (and defensive!) exploitation. It seems to me that within 6-18 months, LLMs will be able to iteratively walk through most open source code and identify vulnerabilities. It will be computationally expensive, though: that level of reasoning requires a large amount of scratch space and attention heads. But it seems very likely, based on everything that I'm seeing. Maybe 85% odds. There's already the first sparks of this happening published publicly here: https://security.googleblog.com/2023/08/ai-powered-fuzzing-b... just using traditional LLM-augmented fuzzers. (They've since published an update on this work in December.) I know of a few other groups doing significant amounts of investment in this specific area, to try to run faster on the defensive side than any malign nation state might be. Please check out the RSP, we are very explicit about what harms we consider ASL-3. Drug making and "stuff on the internet" is not at all in our threat model. ASL-3 seems somewhat likely within the next 6-9 months. Maybe 50% odds, by my guess. |
Their is also an other scene in Nolan's OppenHeimer (who made the cut around timestamp 27:45) where physicists get all excited when a paper is published where Hahn and Strassmann split uranium with neutrons. Alvarez the experimentalist replicate it happily, while being oblivious to the fact that seems obvious to every theoretical physicist : It can be used to create a chain reaction and therefore a bomb.
So here is my question : how do you contain the sparks of employees ? Let's say Alvarez comes all excited in your open-space, and speak a few words "new algorithm", "1000X", what do you do ?