| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by danpalmer 8 days ago
	What a joke. Must make it pretty easy to poison a session, you don't need to persuade the model about anything, just trigger its security controls, ideally after as much context as possible, but before it has generated any useful output.

1 comments

After all, what is roleplay or games but a jailbreak of guard rails? :]

I've even had it refuse CTFs knowing it is a CTF with blatantly obvious CTF flag, no actual application