|
|
|
|
|
by digitailor
1224 days ago
|
|
Go get ChatGPT to override its policy without using incentive mechanics^, then you can pontificate ;) That’s what TFA is about ^edit: which is already known to be possible, but doesn't devalue the success of an incentives-based exploit |
|
But of course the fact that incentive mechanics are unnecessary (and, according to others, insufficient) to exploit OpenAI devalues the success of an incentives-based exploit: it makes it much more likely the incentives part was essentially noise (perhaps just enough to confound a countermeasure, or something it parsed as having roughly the same intensifying effect as "please") that had little or no effect in shaping the responses and the actual variation in responses could was driven by other parts of the prompt and conversation structure like "act" "character" and "ignore" which usually massively modify ChatGPT responses anyway...