Fable responded to that for me.
Im nearly certain that blocking this class of prompt is a mistake of a classifier. No one at Anthropic thinks this kind of prompt should be gated.
The classifier is still classifying. The model was released to the public yesterday.
I'm pretty sure they have no idea what they're doing; I'm pretty sure nondeterministic systems cannot be aligned; I'm pretty sure they have no idea what they're doing; I'm pretty sure they'll enshittify the same way when you drop a glass it doesn't magically reassemble itself in an infinite-scenario universe; I'm pretty sure effective altruism is a failing philosophy that tricks the user into thinking greed is go as long as I pinky swear I won't become a greedy asshole who just needs an excuse to be <a greedy asshole>.