|
|
|
|
|
by rodspeed
81 days ago
|
|
I ran 4,470 trials across three language models (Claude Haiku, GPT-4o-mini, Gemini Flash Lite) on seven reasoning tasks, constraining them to write in E-Prime (no "to be") or without possessive "to have." The constraints don't uniformly help — they reshape reasoning in task-specific and model-specific ways. Key findings: -No-Have improves ethical reasoning by 19pp (p<0.001) and epistemic calibration by 7.4pp across all models
-E-Prime improves Gemini's ethical reasoning by 42pp but collapses GPT-4o-mini's epistemic calibration by 27pp
-Cross-model correlations reach r=-0.75 — the same constraint helps one model and hurts another
-A 3-agent ensemble using linguistically diverse constraints hits 100% coverage on debugging problems vs 88% for the unconstrained control The idea: for an LLM, language isn't a medium through which cognition passes — it IS the cognition. Designing the vocabulary an agent reasons in is a distinct engineering discipline from prompt or context engineering. I call it "Umwelt engineering" after Jakob von Uexküll's concept of an organism's perceptual world. Paper: https://arxiv.org/abs/2603.27626
Code + data: https://github.com/rodspeed/umwelt-engineering |
|