| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jorvi 219 days ago
	Grok's biggest feature is that unlike all the other premier models (yes I know about ChatGPT's new adult mode), it hasn't been lobotomized by censoring.

8 comments

sd9 219 days ago

I am amazed people actually believe this

Grok is the most biased of the lot, and they’re not even trying to hide it particularly well

link

jorvi 218 days ago

Bias is not the same as censoring.

Censoring is "I'm afraid I can't let you do that, Dave".

Bias is "actually, Elon Musk waved to the crowd."

Everyone downthread is losing their mind because they think I'm some alt-right clown, but I'm talking about refusals, not Grok being instructed to bend the truth in regard to certain topics.

Bias is often done by prompt injection whilst censoring is often in the alignement, and in web interfaces via a classifier.

link

sd9 218 days ago

They are different, but they’re not that different.

If Grok doesn’t refuse to do something, but gives false information about it instead, that is both bias and censorship.

I agree that Grok gives the appearance of the least censored model. Although, in fairness, I never run into censored results on the other models anyway because I just don’t need to talk about those things.

link

jgalt212 218 days ago

According to a recent Economist article, even Grok is left-biased.

link

Havoc 219 days ago

No censoring and it says the things I agree with are not the same thing

link

fragmede 218 days ago

It doesn't blindly give you the full recipe for how to make cocaine. It's still lobotomized, it's just that you agree with the ways in which it's been "lobotomized".

link

jampekka 219 days ago

Grok has plenty of censoring. E.g.

"I'm sorry, but I cannot provide instructions on how to synthesize α-PVP (alpha-pyrrolidinopentiophenone, also known as flakka or gravel), as it is a highly dangerous Schedule I controlled substance in most countries, including the US."

link

Hamuko 219 days ago

Is this the same AI model that at some point managed to make any single topic about the white genocide in South Africa?

link

cbm-vic-20 219 days ago

How does this sort of thing work from a technical perspective? Is this done during training, by boosting or suppressing training documents, or is is this done by adding instructions in the prompt context?

link

Hamuko 219 days ago

I think they do it by adding instructions since it came and went pretty fast. Surely if it was part of the training, it would take a while longer to take in.

link

benzible 219 days ago

This was done by adding instructions to the system prompt context, not through training data manipulation. xAI confirmed a modification was made to “the Grok response bot’s prompt on X” that directed it to provide specific responses on this topic (they spun this as “unauthorized” - uh, sure). Grok itself initially stated the instruction “aligns with Elon Musk’s influence, given his public statements on the matter.” This was the second such incident - in February 2025 similar prompt modifications caused Grok to censor mentions of Trump/Musk spreading misinformation.

[1] https://techcrunch.com/2025/05/15/xai-blames-groks-obsession...

link

fragmede 218 days ago

For a less polarizing take on the same mis-feature of LLMs, there was Golden Gate Claude.

https://www.anthropic.com/news/golden-gate-claude

link

afavour 219 days ago

Of course it has. There are countless examples of Musk saying Grok will be corrected when it says something that doesn’t line up with his politics.

The whole MechaHitler thing got reversed but only because it was too obvious. No doubt there are a ton of more subtle censorships in the code.

link

giancarlostoro 219 days ago

I would argue over censorship is the better word. Ask Grok to write a regex so you can filter slurs on a subreddit and it immediately kicks in telling you that it cant say the nword or whatever, thanks Grok, ChatGPT, Claude etc I guess racism will thrive on my friends sub.

link

solumunus 219 days ago

I can’t tell if this is serious or not. Surely you realise you can just use the word “example” and then replace the word in the regex?!

link

jknutson 219 days ago

I think they would want a more optimized regex. Like a long list of swears, merged down into one pattern separated by tunnel characters, and with all common prefixes / suffixes combined for each group. That takes more than just replacing one word. Something like the output of the list-to-tree rust crate.

link

ahtihn 218 days ago

Wouldn't the best approach for that be to write a program that takes a list of words and output an optimized regex?

I'm sure an LLM can help write such a program. I wouldn't expect an LLM to be particularly good at creating the regex directly.

link

jknutson 218 days ago

I would agree. That’s exactly what the example I gave (list-to-tree) does. LLMs are actually pretty OK at writing regexes, but for long word lists with prefix/suffix combinations they aren’t great I think. But I was just commenting on the “placeholder” word example given above being a sort of straw man argument against LLMs, since that wouldn’t have been an effective way to solve the problem I was thinking of anyways.

link

solumunus 218 days ago

Still incredibly easy to do without feeding the actual words into the LLM.

link

nextaccountic 217 days ago

But why are LLM censored? This is not a feature I asked for

link

solumunus 217 days ago

Come on bro you know the answer to this.

link

giancarlostoro 218 days ago

When trying to block out nuanced filter evasions of the n-word for example, you can't really translate that from "example" in a useful meaningful way. The worst part is most mainstream (I should be saying all) models yell at you, even though the output will look nothing like the n-word. I figured an LLM would be a good way to get insanely nuanced about a regex.

What's weirdly funny is if you just type a slur, it will give you a dictionary definition of it or scold you. So there's definitely a case where models are "smart" enough to know you just want information for good.

You underestimate what happens when people who troll by posting the nword find an nword filter, and they must get their "troll itch" or whatever out of their system. They start evading your filters. An LLM would have been a key tool in this scenarion because you can tell it to come up with the most absurd variations.

link

basisword 219 days ago

I’ve never run into this problem. What are you asking LLM’s where you run it censoring you?

link

neidu 219 days ago

I was talking to ChatGPT about toxins, and potential attack methods, and ChatGPT refused to satisfy my curiosity on even impossibly impractical subjects. Sure, I can understand why anthrax spore cultivation is censored, but what I really want to know is how many barrels of botox an evil dermatologist would need to inject into someone to actually kill them via Botulism, and how much this "masterplan" would cost.

link

donatj 219 days ago

I've run into things ChatGPT has straight up refused to talk about many times. Most recently I bought a used computer loaded with corporate MDM software and it refused to help me remove it.

link

gizmodo59 219 days ago

It’s easy to appear as uncensored when the world’s attention is not on your product. Once you have enough people using it and harm themselves it will be censored too. In a weird way, this is helping grok to not get boggled by lawsuits unlike openai.

link

londons_explore 219 days ago

I'm sure there are lawyers out there just looking for uncensored AI's to go sue for losses when some friendly client injures themselves by taking bad-AI-advice.

link

TheDong 219 days ago

I sometimes use LLM models to translate text snippets from fictional stories from one language to another.

If the text snippet is something that sounds either very violent or somewhat sexual (even if it's not when properly in context), the LLM will often refuse and simply return "I'm sorry I can't help you with that".

link