Funny you say that! When the two new models were released Friday I spun up mayors for each. (But didn’t do the prompting in the most scientific way.)
Have also thought about using openrouter and getting one mayor per model running the same prompt through all of them to create potentially the world's dumbest LLM benchmark.