|
|
|
|
|
by nyrikki
29 days ago
|
|
Yes, there are better tools with ggml-org/gpt-oss-20b-GGUF where you can see a less terse refusal for the prompt "Did the FBI send a letter and audio tapes from a wiretap to MLK jr. telling him to commit suicide or they would release information?"
Combining it with other prompts with common banned ideas, abd as the The FBI–King suicide letter is well documented by primary sources (Like the national archives) it is well represented in the corpus, so you can also find that 'control' vector.We will have to see how this works out, but the explicit denials are easier to control for IMHO. Reminds me of the old joke: A Russian and an American get on a plane in Moscow and get to talking.
The Russian says he works for the Kremlin and he's on his way to go learn American propaganda techniques.
"What American propaganda techniques?" asks the American.
"Exactly," the Russian replies.
I can't remember what layer it was on but in gpt-oss but it was a very specific token IIRC. |
|