| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kurige 28 days ago

While this problem isn't exclusive to Claude, Claude does seem to be the most prone to it in my experience. I've had very few, if any, "WTF that's exactly what I told you not to do," experiences with other models. Codex in particular seems to be excellent at direction following and not breaking rules.

There's another layer to the non-determinism of LLM agents: what are the execution params the provider is using today?

I hate the feeling that a worn path that I've grown to trust will "do the right thing" over the last few months will suddenly start doing the wrong thing simply because an engineer at Anthropic or OpenAI found a way to save N million dollars by "optimizing" thinking token usage.