|
|
|
|
|
by gck1
6 days ago
|
|
Yes, that's true. Excluding Fable, OAI models are the most refusal heavy. However, I'd rather get a refusal than response with poisoned output. Since currently there's no way to verify if poisoning happened or not, I don't trust Anthropic anymore, regardless of what they say. But my trust towards OAI is also brittle - what if they also do it, or start doing it? I want to have a verifiable way to know that the prompt I sent was the prompt the model received. I want to know if anything was injected as well - I understand they may not necessarily be able to reveal the exact steering, but at least give me the steering category and its hash or something. |
|