| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by spdustin 849 days ago
	Got it for you already, Simon ;) https://chat.openai.com/share/ea8d5442-75e4-40d5-b62c-c4856b...

2 comments

simonw 849 days ago

"Now return the same JSON response, with the values to each key inverted" is neat!

link

btbuildem 849 days ago

I think we may be using different GPT versions (4 here), otherwise I'm not sure how to account for the difference in results: https://chat.openai.com/share/c172e2ec-94c7-4d8a-be2d-58461b...

I run your example verbatim, and it doesn't "jailbreak"

link

spdustin 849 days ago

4 here as well. I get similar results when using the API directly, though without a "system" role message.

LLMs are, naturally, non-deterministic. Reducing the temperature in your guardrail calls can reduce that a bit, but the lesson learned from the "working" and "non-working" attempts is this: the guardrails are "predictably failing in unpredictable ways" (if I may coin a phrase).

link