| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aed 126 days ago

Funny you say that! When the two new models were released Friday I spun up mayors for each. (But didn’t do the prompting in the most scientific way.)

Mayor Compounded Wonder - Claude Opus 4.6

https://hallucinatingsplines.com/mayors/compounded-wonder-2c...

Mayor Bronze Offramp - OpenAI Codex 3.6

https://hallucinatingsplines.com/mayors/bronze-offramp-09941...

TL;DR: Opus won.

Have also thought about using openrouter and getting one mayor per model running the same prompt through all of them to create potentially the world's dumbest LLM benchmark.