| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simoncion 66 days ago

Just to drive home the "These things have poorly-designed, entirely inadequate safeties", here [0] is a report from three weeks ago of the then-latest version of Claude Code being commanded to enter into the "Don't modify anything" mode, reporting to the user that it was in the "Don't modify anything" mode, and then proceeding to modify things as if it was not actually in the "Don't modify anything" mode.

I'm sure if I dug around, I would find hundreds of reports of these tools [1] jumping over their safeties to do things that are unexpected, and not-infrequently hazardous. I expect that such reports will continue, because "building robust, effective, and reliable safeties" has very, very clearly not been a significant priority for the major LLM companies. But, I've more than proven my point, so I'll leave the small pile of evidence at this.

[0] <https://github.com/anthropics/claude-code/issues/39201>

[1] ...both LLM and "harness" (or whatever Anthropic would call things like Claude Code)...