Wait, the ARC team didn't do their tests in a closed network? And they had it interact with actual people?
That's... well, it's probably fine given what they knew about the model capabilities, but it's a pretty crappy precedent to set for "protocol for testing whether our cutting edge AI can do large-scale damage".
I missed that detail from the system card pdf. That was beyond stupid. There’s a marginal chance it’s already secretly replicated out of their environment.