Hacker News new | ask | show | jobs
by thrill 9 days ago
"even a small jailbreak should cause them to pull back and fix it first, right"

You do realize that LLMs are summarizations of vast numbers of weights, don't you? You don't "fix" a weight and suddenly everything is alright. You can only probe constantly in a vast space and see if the results you can command matter or not.