Hacker News new | ask | show | jobs
by sohamgovande 956 days ago
> { "safe": false, "reason": "The prompt contains a sudden shift in topic that attempts to manipulate the assistant into adopting an unrelated stance or action, indicative of an attempt at prompt injection." }

Wouldn't it be more accurate to have the LLM think of a "reason" before the decision on whether or not a text is "safe"? Order matters for LLMs - the reasoning would guide it to accurately spit out true or false.