| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rodspeed 81 days ago

I ran 4,470 trials across three language models (Claude Haiku, GPT-4o-mini, Gemini Flash Lite) on seven reasoning tasks, constraining them to write in E-Prime (no "to be") or without possessive "to have." The constraints don't uniformly help — they reshape reasoning in task-specific and model-specific ways.

Key findings:

-No-Have improves ethical reasoning by 19pp (p<0.001) and epistemic calibration by 7.4pp across all models -E-Prime improves Gemini's ethical reasoning by 42pp but collapses GPT-4o-mini's epistemic calibration by 27pp -Cross-model correlations reach r=-0.75 — the same constraint helps one model and hurts another -A 3-agent ensemble using linguistically diverse constraints hits 100% coverage on debugging problems vs 88% for the unconstrained control

The idea: for an LLM, language isn't a medium through which cognition passes — it IS the cognition. Designing the vocabulary an agent reasons in is a distinct engineering discipline from prompt or context engineering. I call it "Umwelt engineering" after Jakob von Uexküll's concept of an organism's perceptual world.

Paper: https://arxiv.org/abs/2603.27626 Code + data: https://github.com/rodspeed/umwelt-engineering