| A few observations, based on a family member experimenting with DeepSeek. I'm pretty sure it was running locally. I'm not sure if it was built from source. The censorship seemed to be based on keywords, applied the input prompt and the output text. If asked about events in 1990, then asked about events in the previous year DeepSeek would start generating tokens about events in 1989. Eventually it would hit the word "Tiananmen", at which point it would partially print the word, then in response to a trigger delete all the tokens generated to date and replace them with a message to the effect of "I'm a nice AI and don't talk about such things." If the word Tiananmen was in the prompt, the "I'm a nice AI" message would immediately appear, with no tokens generated. If Tiananmen was misspelled in the prompt, the prompt would be accepted. DeepSeek would spot the spelling mistake early in its reasoning and start generating tokens until it actually got around to printing to the word Tiananmen, at which point it would delete everything and print the "nice AI" message. I'm no expert on these things, but it looked like the censorship isn't baked into the model but is an external bolt on. Does this gel with other's observations? What's the take of someone who knows more and has dived into the source code? Edit: Consensus seems to be that this instance was not being run locally. |
If this family member is experimenting with DeepSeek locally, they are an extremely unusual person and have spent upwards of $10,000 if not $200,000. [0]
> ...partially print the word, then in response to a trigger delete all the tokens generated to date and replace them...
It was not running locally. This is classic bolt-on censorship behavior. OpenAI does this if you ask certain questions too.
If everyone keeps loudly asking these questions about censorship, it seems inevitable that the political machine will realize weights can't be trivially censored. What will they do? Start imprisoning anyone who releases non-lobotomized open models. In the end, the mob will get what it wants.
[0] I am extremely surprised that a 15-year-long HN user has to ask this question, but you know what they say: the future is not fairly distributed.