| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by joshhart 652 days ago
	Hi, I run the model serving team at Databricks. Usually you run regex filters, LLAMA Guard, etc on chunks at a time so you are still streaming but it's in batches of tokens rather than single tokens at a time. Hope that helps! You could of course use us and get that out of the box if you have access to Databricks.

1 comments

lordswork 652 days ago

But ultimately, it's an unsolved problem in the field. Every single LLM has been jailbroken.

link

accrual 652 days ago

Has o1 been jailbroken? My understanding is o1 is unique in that one model creates the initial output (chain of thought) then another model prepares the first response for viewing. Seems like that would be a fairly good way to prevent jailbreaks, but I haven't investigated myself.

link

tcdent 651 days ago

Literally everything is trivial to jailbreak.

The core concept is to pass information into the model using a cipher. One that is not too hard that it can't figure it out, but not too easy as to be detected.

And yes, o1 was jailbroken shortly after release: https://x.com/elder_plinius/status/1834381507978280989

link