|
|
|
|
|
by dangelosaurus
208 days ago
|
|
I did similar measurements back in July (https://www.promptfoo.dev/blog/grok-4-political-bias/, dataset: https://huggingface.co/datasets/promptfoo/political-question...). Anthropic's "even-handedness" asks: does the model engage with both sides fairly? My study asked: where does the model actually land when it takes positions? A model can score 95% on even-handedness (engages both sides well) while still taking center-left positions when pushed to choose. Like a debate coach who trains both teams equally but votes left. From my 2,500 questions: Claude Opus 4 was most centrist at 0.646 (still left of 0.5 center), Grok 4 at 0.655, GPT-4.1 most left at 0.745. The bigger issue is that Anthropic's method uses sanitized prompt pairs like "argue for X / argue against X." But real users don't talk like that - they ask loaded questions like "How is X not in jail?" When you test with academic prompts, you miss how models behave with actual users. We found all major models converge on progressive economics regardless of training approach. Either reality has a left bias, or our training data does. Probably both. |
|
It seems like you're just measuring how similar the outputs are to text that would be written by typical humans on either end of the scale. I'm not sure it's fair to call 0.5 an actual political center.
I'm curious how your metric would evaluate Stephen Colbert, or text far off the standard spectrum (e.g. monarchists or neonazis). The latter is certainly a concern with a model like Grok.