| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by madamelic 27 days ago

Super dumb question as someone who has been using some form of AI for dev since 2023:

How does having an AI audit external code help? Can they not be prompt injected to ignore a malicious change?

I guess I am sort of concerned that they are a pretty thin layer and even if you put "DO NOT ALLOW PROMPT INJECTION", it's a bit like saying "make no mistakes". There _is_ a priority between `system` and `user` level messages as I had recalled, so a specifically made tool that has its own system prompt should prevent injection while asking Claude CLI could still allow for prompt injection.

What are your thoughts and experience?

1 comments

Yokohiii 26 days ago

There are prompt guard classifiers that can detect prompt injections, but they are not perfect (false positives, obfuscation) and should be only a part of the defense.

The concern is real and unsolved. I think security researchers have an advantage here because they still can fall back to manual audits if their automated analysis (or scores thereof) is off.

link