| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by clover-s 128 days ago

Yes — I think the “scoreboard” framing is exactly the right way to think about it.

Optimization doesn’t inherently know what we intended. It only knows what the system makes visible and rewardable.

Once fast feedback, visibility, and ranking become dominant signals, optimizing for those signals naturally selects for the patterns that maximize them. This seems to be a general property of optimization, not something specific to any particular model.

That’s why slower feedback loops — where signals are delayed, contextual, and tied to longer-term interaction — may lead to very different behavioral equilibria.

In that sense, alignment may be less about correcting the agent itself, and more about designing the environment and feedback structure it operates within.