| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by horizion2025 288 days ago
	Isn't that just another guardrail that can be bypassed much the same as the guard rails are currently quite easily bypassed? It is not easy to detect a prompt. Note some of the recent prompt injection attack where the injection was a base64 encoded string hidden deep within an otherwise accurate logfile. The LLM, while seeing the Jira ticket with attached trace , as part of the analysis decided to decode the b64 and was led a stray by the resulting prompt. Of course a hypothetical LLM could try and detect such prompts but it seems they would have to be as intelligent as the target LLM anyway and thereby subject to prompt injections too.

2 comments

wrs 288 days ago

Yep.

https://gandalf.lakera.ai/baseline

Huppie 288 days ago

This is genius, thank you.

dotancohen 278 days ago

It took me days to complete!

darepublic 288 days ago

We need the severance code detector

brianjking 288 days ago

wearing my lumon pin today.