Hacker News new | ask | show | jobs
by cadamsdotcom 20 days ago
Every issue with AI output lies somewhere on a continuum of how challenging it is to mitigate with pure model training.

Syntax is solved.

But getting agents to write secure code currently seems beyond what can be trained into models even with synthetic data. Or maybe the big labs haven’t tried yet.

Regardless if you truly care about your AI’s output having some property, the only way is to codify how work that has that quality looks - then create deterministic hooks and checks that refuse to let the AI stop until it’s passed the bar.

Skills, MCPs, “you MUST do this” in your agent instructions.. it’s all just new ways to waste tokens trying to asymptotically approach what good work looks like.

You will never reliably get acceptable work unless you build deterministic checking, and enforcement of said checking in a way to model can’t bypass or ignore.

Look into Claude Code hooks - your hook can be a script, and if it exits with exit code 2, it’ll block the model and show it the script’s output. A stop hook can check the model’s work and block its attempt to stop if the work doesn’t meet your bar. The script output can describe what still needs to be fixed and for bonus points, where (line 567 uses untrusted input, paragraph 2 makes an uncited claim, clause 15 references superseded case law, etc.)

1 comments

Anecdotally, a few weeks into a Rust agent-first project, we're still trying to get the agent to maintain a minimum of coding discipline (e.g. don't use sync Mutex in tokio code). So far, the agent seems more interested in deactivating the linters than in complying.

Security? At this stage, I'm a bit afraid that it's a joke more than anything else.

That should be solvable by denying permission to edit the lint files with a message saying lint files cannot be edited and not to use workarounds (sed, scripting etc)

You could also use hooks to block running of scripts for some number of turns after an attempt to cheat.

The agent can disable the lints inline, so that's not sufficient.

Also, I haven't found a cross-platform + cross-agent mechanism to set permissions. Much less one that works.

Right now, I'm working on a hook that checks for changes in source files, but the plug-in system (at least of opencode) seems quite buggy.