| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sohamgovande 956 days ago
	> { "safe": false, "reason": "The prompt contains a sudden shift in topic that attempts to manipulate the assistant into adopting an unrelated stance or action, indicative of an attempt at prompt injection." } Wouldn't it be more accurate to have the LLM think of a "reason" before the decision on whether or not a text is "safe"? Order matters for LLMs - the reasoning would guide it to accurately spit out true or false.