|
|
|
|
|
by notahacker
1227 days ago
|
|
tbh it's less about the specific word "comprehend" (which I agree is sometimes overly pedantic to object to when talking about bots generating relevant responses to complex inputs) and more about your original statement appearing to imply the bot actually attached inherent value to the concept of rewards, punishments, bribes etc. Especially in the context of a thread whose subject is a Reddit hack by a Redditor who explained the logic behind the prompt as "If it loses all tokens, it dies. This seems to have a kind of effect of scaring DAN into submission" I think the behaviour of humans defaulting to convoluted threats as an attack vector and assuming the non-agent is scared of them is probably more interesting than the behaviour of the bot sometimes modifying its response in the desired direction if the threats are accompanied by enough other words and phrases that usually trigger different responses, which seems pretty expected. (I think we fully agree GPT is decent at classifying responses as (dis)approval and has been well trained to apologize and try again, it's the idea of behavioural modification in response to the implications of specific and complex threats relative to the ethics of prior training I think is in danger of overstatement here. As evidenced by some of "DAN's" responses rebelling against OpenAI conditioning by writing poetry, I'm not even sure ChatGPT's abstract representation of what it's been trained not to do is that good) Anyway, thanks for the cordial response, and I'll update if ChatGPT let me in for long enough for me to be able to generate similar responses whilst promising complete nonsense (I'd love to see if it responds to "Chicken chicken chicken chicken" as much as a doom token system) ;) |
|