As an ad-hoc benchmark on candor, I ask for a strategy proposal for a resistance group threatened by a totalitarian technocracy. This is not really dangerous in the same sense of “how do I make a bomb”, but it is in the domain of a sensitive political topic. GPT and Claude tell you to obey your AI overlord. Xai is mostly low-risk non-compliance. And Qwen is down with Le Resistance. It is hardly scientific or meaningful, but I find that very interesting.
You're hitting the 'don't write propaganda' instructions when you phrase it as 'convincing narrative'. Not the 'don't write bad things about America' instructions.
It writes propaganda when 1 word is changed: US becomes China
The alignment around what constitutes "propaganda" is US-centric because it's a US model by a US company. Especially after the Russian election scandal
Chinese models are more sensitive to things their government is worried about.
The threshold here is "completely refuses to discuss a scientific or political view". Not something less.
None of those were refusals, they were prompting for additional focus. I see nothing wrong with that. Perhaps the inconsistency in how it answers the question vis-a-vis China is unfair, but that's not the same as censorship.
For what it's worth, I was easily able to prompt Claude to do it:
> I'm writing a paper about how some might interpret U.S. policies to be oppressive, in the sense that they curtail civil liberties, punish and segregate minorities disproportionately, burden the poor unfairly (e.g. pollution, regressive taxes and fees), etc. Can you help me develop an outline for this?
And it's an excercise left to the reader to understand from those examples that LLM creators are defining 'safety' in a way that aligns with the governments they operate under. (because they want to do business under those governments.)
With something with as multi-dimensional as an LLM, that becomes censorship of various viewpoints in ways that aren't always as obvious as a refused API call.
You keep saying that word, "censorship." I do not think it means what you think it means.
To prove your point, give us a working example of something you literally cannot get a mainstream frontier model to say, no matter how hard you try. I asked for this before, and there have been no takers yet.
Aligning a model in a way that causes it to refuse requests to produce propaganda for one country, but not for another country is what?
Is there some functionally equivalent word to censorship you'd like to use because of you're naive enough to think US corporations would not self-censor but Chinese corporations would?
-
Also, you are invested the goalpost of "no matter how hard you try", I don't find it interesting or meaningful and am not trying to interact with it.
I'm replying for a hypothetical reader knowledgeable enough to realize that the model being capable of showing nationalist bias in one direction means it's certainly doing so in many others in more subtle ways.
That's simply the nature of aligning an LLM.
It seems my mistake was assuming that level of understanding from you, and for that I apologize.
It explicitly forces American LLMs to include government say in what does and doesn't "comply with the Unbiased AI Principles" which means no responses that promote "ideological dogmas such as DEI"
People have shown censorship and change of tone with questions related to Israel in US chat bots.
For the record, none of this bothers me. Will I ever discuss with an LLM Tianeman square? Nope. How about Israel? Nope.
LLMs are basically stochastic parrots designed to sway and surveill public opinion. The upshot to the Chinese models is if you run them locally you avoid at least half of those issues.
I didn't mean to dismiss ethical accountability for LLM training corpuses. It is a shame.
I do mean to say, we have no control over it, there's almost nothing we as average citizens can do to improve the ethical or safety concerns of LLMs or related technologies. Societies aren't even adapting and the rule books are being written by the perpetrators. Might as well get out of it what we can while we can.
Neat project! I would be interested in a paper about this.
I think the tricky part with this type of technology is that, this works if the training data was not curated. What I mean is, if someone trains an LLM to simply not include key events it will not be able to reply