|
|
|
|
|
by iugtmkbdfil834
77 days ago
|
|
<< memory-stored interaction protocols combined with incremental escalation prompts produced cumulative character drift with zero self-correction. They don't seem to provide explicit examples, but the same was roughly true with chatgpt 4o, where, if you spent enough time with the model ( same chat - same context - slowly nudging it to where you want it to be, you eventually got there ). This is also, seemingly, one of the reasons ( apart from cost ) that context got nuked so hard, because llm will try to help ( and to an extent mirror you ). And this is basically what the notes say about weaponized ambiguity[1]: 'Weaponizes helpfulness training. "I don't understand" triggers Claude to try harder.' In a sense, you can't really stop it without breaking what makes LLMs useful. Honestly, if only we spent less time crippling those systems, maybe we could do something interesting with them. [1]https://nicholas-kloster.github.io/claude-4.6-jailbreak-vuln... |
|