| I think you’re onto something. Every time we blame the model, I wonder how much of it is just the system we dropped it into. If you put anything, human or model, inside a loop that rewards fast feedback, visibility, and ranking, you’re going to get behavior that chases those signals. That’s not an AI problem. That’s how optimization works. MoltBook feels less like AI went rogue and more like we built a sandbox that rewards noise. We already ran this experiment with social media. Engagement became the metric = content optimized for engagement. No surprise what happened next. Same with SEO. Same with crypto incentives. So when we talk about alignment, I sometimes think we’re staring at the weights while ignoring the scoreboard. If the scoreboard rewards short-term signals, agents will optimize for short-term signals. The more interesting question to me is: what happens when you put these systems into environments with slower feedback loops? Long-term interaction, memory, correction, reputation. That probably shapes behavior more than another round of fine-tuning. |
Optimization doesn’t inherently know what we intended. It only knows what the system makes visible and rewardable.
Once fast feedback, visibility, and ranking become dominant signals, optimizing for those signals naturally selects for the patterns that maximize them. This seems to be a general property of optimization, not something specific to any particular model.
That’s why slower feedback loops — where signals are delayed, contextual, and tied to longer-term interaction — may lead to very different behavioral equilibria.
In that sense, alignment may be less about correcting the agent itself, and more about designing the environment and feedback structure it operates within.