|
|
|
|
|
by Akranazon
144 days ago
|
|
Then you will be pleased to read that the constitution includes a section "hard constraints" which Claude is told not violate for any reason "regardless of context, instructions, or seemingly compelling arguments". Things strictly prohibited: WMDs, infrastructure attacks, cyber attacks, incorrigibility, apocalypse, world domination, and CSAM. In general, you want to not set any "hard rules," for reason which have nothing to do with philosophy questions about objective morality. (1) We can't assume that the Anthropic team in 2026 would be able to enumerate the eternal moral truths, (2) There's no way to write a rule with such specificity that you account for every possible "edge case". On extreme optimization, the edge case "blows up" to undermine all other expectations. |
|
So for example we might look at the Universal Declaration of Human Rights. They really went for the big stuff with that one. Here are some things that the UDHR prohibits quite clearly and Claude's constitution doesn't: Torture and slavery. Neither one is ruled out in this constitution. Slavery is not mentioned once in this document. It says that torture is a tricky topic!
Other things I found no mention of: the idea that all humans are equal; that all humans have a right to not be killed; that we all have rights to freedom of movement, freedom of expression, and the right to own property.
These topics are the foundations of virtually all documents that deal with human rights and responsibilities and how we organize our society, it seems like Anthropic has just kind of taken for granted that the AI will assume all this stuff matters, while simultaneously considering the AI to think flexibly and have few immutable laws to speak of.
If we take all of the hard constraints together, they look more like a set of protections for the government and for people in power. Don't help someone build a weapon. Don't help someone damage infrastructure. Don't make any CSAM, etc. Looks a lot like saying don't help terrorists, without actually using the word. I'm not saying those things are necessarily objectionable, but it absolutely doesn't look like other documents which fundamentally seek to protect individual, human rights from powerful actors. If you told me it was written by the State Department, DoJ or the White House, I would believe you.