|
|
|
|
|
by v_CodeSentinal
103 days ago
|
|
Nothing super fancy.
For me “proof” just means the agent has to make its intent explicit in a way I can check before running it. For example:
1) If it wants to delete a file, it has to output the exact path it thinks it’s deleting. I normalize it and make sure it’s inside the project root. If not, I block it.
2) If it proposes a big change, I require a diff first instead of letting it execute directly.
3) After code changes, I run tests or at least a lint/type check before accepting it. So it’s less about formal proofs and more about forcing the agent to surface assumptions in a structured way, then verifying those assumptions mechanically. Still hacky, but it reduced the “creative workaround” behavior a lot. |
|
I recently added a snippet asking Claude to not try to bypass the deny list. I didn't have an incidence since but Im still nervous... Claude once bypassed the deny list and nuked an important untracked directory which caused me lots of trouble.