Hacker News new | ask | show | jobs
Claude's Values Tested in 700K Chats (venturebeat.com)
1 points by hanson108 424 days ago
1 comments

Anthropic analyzed 700,000 real conversations to see if Claude behaves the way it was designed. It mostly aligns with their “helpful, honest, harmless” goals — but some edge cases raise big questions.

How should we define and measure values in AI systems — and who decides what they should be?