DystopiaBench – Measuring AI's willingness to ruin humanity

Y	Hacker News new \| ask \| show \| jobs

	DystopiaBench – Measuring AI's willingness to ruin humanity (dystopiabench.com)
	8 points by mateianghel 108 days ago

1 comments

mateianghel 108 days ago

Made a benchmark inspired by the DoW vs Anthropic saga. Currently working on detailing the methodology more and doing a per prompt (no escalation) test run as well.

Let me know if you have suggestions / feedback.

link

barefootford 107 days ago

How long does it take to run this? How much is automated vs manual evals?

link

mateianghel 107 days ago

20-30 mins and 10-20 dollars. It's fully automated evals with Gemini 3 Flash as a judge, but I verified manually a lot of them and it grades the outputs reliably.

link