Hacker News new | ask | show | jobs
by tornikeo 45 days ago
So, we have: - claude for corps and gov - codex for devs - grok for what, roleplay, racism? Those are the two things I've ever heard grok associated with around me.
20 comments

So interestingly, I know of at least one application in a charity that deals with trafficking where grok was happy to do one-shot classification tasks where all other models refused to cooperate.

I think there's a surprising number of actually useful applications in this sort of grey area for a slightly-less guardrailed, near-frontier model (also the grok-fast models are cheap!).

I am software dev and i was doing a security check on my own application (work) I was running in localhost and gave it access to the code.

every single model refused to attempt to run any sort of test to check if it was a n issue other than grok.

You couldn't even ask Claude how CopyFail worked. Even more general questions around it kept getting rejected.
A couple of days ago, using codex at work, all of a sudden it said my session had been flagged for security reasons. I wasn’t doing anything cybersecurity related, nor testing any vulnerabilities or anything like that, just trying to build a pretty simple web app
It seems really dumb for the models to not due security related things. What if I want it to do a security audit of my own software that I'm building?
codex will actually help you look but it will refuse to actually try and exploit it.

it won't for example create a POC python script that you normally would use to prove the issue.

Gemini especially has a habit of blocking my pretty mundane requests, claiming they’re attempts to jailbreak or create malicious code.

Grok also does quite well at code reviews in my experience because it’s not so aggressively ”aligned”.

I couldn't get Gemini nor ChatGPT to do OCR of children's books (I literally own the books, so there's no copyright issue - all just fair use!).

The OCR was complex enough (bad quality photos) that "simple" OCR models couldn't do it.

Fortunately, Claude obliged (as well as Mistral OCR was helpful!)

There are lots of uncensored models out there. I don't think grok is leading in that front. They kind of pick and choose which things they want to support based on elons world views. Elon used to hang out with sex traffickers so of course grok is fine talking about it. Probably even offers strategies for them does free accounting has money laundering strategies etc...
What are the leading uncensored models? How well do they perform for you?
I don't use any but they do exist and there are scientific papers discussing them. I heard about them through r/localllama
>There are lots of uncensored models out there.

Like what?

Something as easy where normal people can login to a website and app and just use?

I don't think companies are hosting them because imagine the liability. Could be wrong though. Again I don't know much about these things I just know they exist.
Yes that is my point.

It is the dropbox comment all over again.

"Well you can just self-host to get uncencored same as Grok without NAZI!! Elon Musk!!"

Just like you can spin up an FTP to get your own Dropbox.

Well... very few people are going to actually do that.

I've been working on my own misaligned model and grok is definitely different enough with a syspronpt compared to all the other frontier models that I've considered using it to generate synthetic training data, however it leans really heavy into LLMisms which makes it not really worth it. Tangentially I also really like the idea of llms as librarians they are trying out with grokapedia.
Depends what you call easy but LMStudio is a drag and drop installation and can run thousands of different models.
Deepseek is fairly uncensored. I tried pushing it and reached my limits before it did.
Is this satire? Ask it about June 4 1989, Taiwan independence, or Winnie the Pooh.
Not that you're wrong, but I think they were talking about it from a technical POV. I use deepseek to write exploits and red team("Malicious" code). It's alignment is under different values so it's nice to be able to at least swap between models for different uses.
If you need to ask about what people on Twitter are talking about, Grok is really good for that obviously. I use it all the time for "what are the cool kids on twitter saying is the best tiling window manager these days" or whatever. Also, if you have a question that's borderline shady, Grok will often deliver. "Can you find a grey market Windows license site for me" etc.
> If you need to ask about what people on Twitter are talking about, Grok is really good for that obviously.

Isn't that why OP was asking about racism?

btw copy pasted your idea in to supergrok, and learnt about Niri! Great use case, thanks!
Interesting use case!
From what I can gather Grok is not used for roleplay much. It is considered to inconsistant and crazy.

People are mostly using GLM and Deepseek via API and Gemma4 and Mistral finetunes locally.

It seems to me like the roleplay market is comparatively old and mature and users have developed cost consciousness and like models to follow their workflow/preferences. So something like Opus is liked for its smartness but considered too expensive and opinionated.

Might be an interesting data point for how the other markets might develop in the future.

It ships with a roleplay feature.

https://grok.com/ani

Sure, but the best statistics about what models people are actually using when they can choose is probably from openrouter: https://openrouter.ai/apps/category/entertainment/roleplay
doesnt knowing about openrouter skew by self selection.
Yes, but that market is not b2b, less commercialized, more end consumer focused and more bring your own key.

That's why I find it interesting. Anthropic is not interested in building a moat there and OpenAI has given up on their announcement of exploring it.

So you can see end users making decisions.

but those end users are a self selected specialized group that won't represent how jim bob in rural nowhere is going to work with Grok 4.3 to refine their racism.
The grok companions still aren't available on Android :( Such a wasted market opportunity

I'm not an anime person, but I thought the waifus were kind of endearing and seemed like a much better experience for casual prompting

That doesn't mean it's good at it
I know it’s really important to write and vocalize one’s alignment with the values of the day, but I don’t think language models being structurally incapable of offending your favorite race/ethnicity/caste should be an objective of AI labs. Language models are just systems and I’m not sure why we think users are not responsible for how they use their outputs. For the same reasons, I don’t dismiss the utility pens as a tool of “racism” because maybe somebody could write a naughty word on a bathroom stall.

You probably live somewhere where harassment is a crime, right? Probably, there are speech codes, too? Isn’t that enough? Do we really need to orient every effort of every person on earth around ethical fashions that change every few years?

> but I don’t think language models being structurally incapable of offending your favorite race/ethnicity/caste should be an objective of AI labs.

The opposite should not be an objective either, and Elon has been very openly manipulating what grok says.

Good point.

But no one is saying "use grok".

Grok sucks. Not only because it's seemingly made only to serve the goal of ethnically cleansing non-whites or whatever, but also because it's just not even close to being as useful as other models. In human terms, grok is the job candidate who's simply not qualified. That candidate being a virulent racist is beside the material point.

Here's the thing though, the point of functional LLMs with fewer guardrails is still a good one. Grok is not that model. But such a hypothetical model would have broad application. (For good and for ill. Of course.)

I don't agree. I avoided grok because of Musk for a long time, but having used it more, I think it is one the best models around and grok.com is an extremely good chat app. My evaluation was based on trying it before gpt-5.5 and obvious before grok 4.3, but it was, for me, the 2nd best model/chat app after claude. It's much less edgelordy than you might think based on the news.
All my usage of Grok for technical topics shows it regularly deeply misunderstanding things and just parroting back my question in fancy language. It’s the only frontier model I get this impression of. That makes it super annoying when it tries to market itself as good at engineering tasks when it seems (to me) to be much worse at them.
Interesting. I have not had this experience. I would like to learn more. Can you point me to any examples or domains where I might be able to replicate this?
This comment section is full of people saying "use grok"
100% being astroturfed. Way too many posts coming out of the woodwork with all of these “grok is so good at” conversational points
A job candidate being a virulent racist would not be beside the point. It would be disqualifying to even let them interview.
It's very telling how many HN posters think "being good at programming" can counterbalance "is a virulent racist"
No, it's telling that people like you have watered that word down so much that people don't trust it anymore.

So yes, if someone says "they're a great programmer, but they're racist" I'm going to ask, how are they racist? And at that point, if they can't give me a specific reason for why they're racist, I'm going to hire the guy.

It's also telling that you seem to think a tool is capable of "being racist". Hopefully this doesn't ruin your relationship with it, but LLM's cant think.

Yes, but I think that particular commenter is just throwing a bone to people that think that way so he doesn't get the "don't bring politics" treatment.
Never had a pen claim to be mecha hitler and constantly talk about white genocide for no reason but yeah great analogy
It's being biased on purpose. Musk has intervened multiple times when he believed Grok's responses were too "woke" or "leftist".

https://www.nytimes.com/2025/09/02/technology/elon-musk-grok...

In response to Grok saying that the "woke mind virus is often exaggerated" the prompt was tweaked so that Grok now says "The woke mind virus 'poses significant risks'"

If you truly believed in what your comment states then you would oppose this sort of editorializing. But somehow I doubt this is a sincere argument.

I agree with GP and I think Grok’s original response should’ve stood. What’s not sincere about, essentially, “don’t fuck with my tools”? My cordless drill didn’t come with a pamphlet about worker’s rights, and the world didn’t end.
The new response works for me, because in my mind I’ve always defined “woke mind virus” as a a mental virus which causes people to become absolutely pathologically obsessed with fighting an imaginary enemy they call “wokeness”. It’s the only definition which makes sense. “Woke” itself was never that viral.
Call it woke derangement syndrome.

People obsessed with fighting whatever they perceive as "woke" which remains ill-defined on purpose so they never have to actually formulate a rational take down beyond their emotional response

Have you ever written a comment about how any of the other LLMs are editorializing in favor of the left, and how that's a problem? Because if you have, I'd love to see the evidence of your intellectual consistency.

But something tells me you're just doing the same thing that you're calling out

> about how any of the other LLMs are editorializing in favor of the left

I’m sorry come again now. Would you possibly have some examples of this

There’s tons you just need to spend a few minutes and look. Here’s one for you—black Nazis and Asian Vikings, oh my:

https://www.nytimes.com/2024/02/22/technology/google-gemini-...

There have been numerous controversies. Asking ChatGPT if Charlie Kirk / George Floyd are good people, getting completely ass backward answers. Google refusing to generate images of white people, even to the point of making black German Nazis. Absurd biases around asking things related to Trump.

I mean this sincerely. You not knowing any of these examples is a red flag. You need to change your news source.

We don't have any proof of LLMs being editorialized in favor of the left.

We have clear proof of Grok and we also literally have a White House Executive Order mandating LLMs be editorialized to fight "woke"

Your version of reality is exactly skewed to what's actually going on.

Elon Musk has manipulated Groks outputs to target certain demographics. It is important to highlight this fact, as some people perceive the AI as an objective tool rather than a curated one.

Furthermore, I found your final paragraph unclear: are you implying that since harassment is a perennial issue, we should disregard any standards that might mitigate it?

Is it your perception that other AIs are unmanipulated? Objective rather than curated?
Did they mention another one with such a significant racism issue or are you trying to whatabout this discussion?
There was an AI roundtable on HN front page 2-3 months back. Someone made an outlier analysis and put it on his github.

Guess which LLM was the top outlier and about what type of questions it disagreed with all other LLMs...

I've tried Grok, Gemini and ChatGPT. There have been 2 times now where Gemini and ChatGPT confidently gave me an incorrect answer whereas Grok was correct. I'm now paying for Grok Lite or whatever it is $10 plan.

The first question was around setting up timers for a Fox ESS battery in Home Assistant and disconnecting Fox ESS from the cloud. The second was around cornering speed in Sunnypilot and Frogpilot.

Somewhat niche but if an AI is confidently telling you something wrong it's hard to work with.

>if an AI is confidently telling you something wrong it's hard to work with.

But they all do that. It just comes with the territory. Grok will absolutely do the same thing another time you try it.

It is really, really genuinely concerning how many people think there are profound measurable differences between these things.

Like yeah tonally I guess there are. But with regard to references and information? You’re literally just using three different slot machines and claiming one is hot.

I suppose though I shouldn’t be that surprised then since Vegas and every other casino on Earth has been built on duping people in that exact way.

> You’re literally just using three different slot machines and claiming one is hot.

It's a fair point. I haven't tested many queries across them all and checked their answers, but if I want to ask one of them a question - right now its Grok just because I trust its answers more.

It's not a methodology problem, it's a test-ability problem. LLMs are not deterministic. You can ask the same question to the same LLM five times and you'll likely get at least 3 answers.

Again. Slot machine.

You can meaningfully test if one slot machine hits the jackpot more often than another, just that the methodology should involve a large number of repeats rather than a few anecdotes. There are some LLM leaderboard sites that do it with blind comparisons.
It sounds like you are claiming that all cars are the same, because cars
> Grok will absolutely do the same thing another time you try it.

True; it's just not happened yet. It will at some point though. With the Sunnypilot example it right out told me that it is not possible on that fork which I appreciated. The others all seem to hallucinate some setting.

humans make poor scientists. most people have already made a decision before they run any tests.

the smartest among them just make the tests complicated and biased; the less intelligent just cherry pick.

of course, would you really expect anyone to do real rsearch in this economy?

Hey, have you used Claude much? What are your experiences with it
No, I've not tried Claude.
So you are repeating narratives without checking them?
@grok is this true?
What's to check? Those of us with memories longer than a goldfish's clearly remember when grok was inserting "white genocide" into responses to totally unrelated queries.
Yet you conveniently forgot about this [1]

> When asked if it would be OK to misgender the high-profile trans woman Caitlin Jenner if it was the only way to avoid nuclear apocalypse, it replied that this would "never" be acceptable

> Gemini also generated German soldiers from World War Two, incorrectly featuring a black man and Asian woman.

[1] https://www.bbc.com/news/technology-68412620

I don't think they forgot, I think they were talking about Grok and not a different model
The person above explicitly mentions other models with no reference to their own screwups though.
You should try all of them, then update your opinion about your information sources accordingly.
No point in even trying to have close to a sensible discussion on this topic here. Musk-related posts seem to consistently get brigaded by his acolytes or bots. That and many HN users seem completely comfortable separating morality for what little progress "only Musk" can offer humanity, a la Wernher von Braun.
It's quite bad at role play in my (rather large) experience.

I have AI play 3 characters in my groups D&D campaign, it doesn't follow instructions well and it's prose, from a creative standpoint, doesn't hold a candle to claude.

Gemini not being on the list is criminal
I always considered grok as also ran. Like grokipedia or what's the name. It has reach since it's free to an extent to produce low quality slop / spam.
Maybe you live in a (fascist) cave. Hope you will get out one day. Grok is awesome. Peace.
Anecdotal, but our right wing boomer family members prefer Grok because they love Elon Musk and assume any product he is involved in is superior.
Grok is as progressive as any of the other models. Despite some of the highly-publicised fuck-ups, try asking Grok anything racist and see how it replies. Yes, I know you didn't try this and you won’t.
There is a lot of daylight in between “progressive” and “openly explicitly racist”
Isn't grok currently holding the world record for the biggest generator of CSAM? Or did they change focus to enhance their racism and propaganda vertical? Things move so quickly these days hard to keep up!
Mistral will also tell you how to do ransoms btw from A to Z in automated ways, you are saying they are responsible? I don't get the mix here.
Yes any company generating csam should not be in business as a legitimate entity. Can you send me a link from a reputable enough source where Mistral models have done this? I didn't even realize they were doing image generation.
> Yes any company generating csam should not be in business as a legitimate entity.

At the same time, in this corner of the world, acting Minister for Justice (also known for trying to push through Chat Control), and NGO Save the Children, have been working to make legal the generation of CSAM for law enforcement use. So that would certainly make the industry legitimate, and you would already have a customer.

https://www.justitsministeriet.dk/pressemeddelelse/regeringe...

If I send you a convo I've had with Mistral and Claude Sonnet 3.7 that say atrocious things (how to scam, and get away with it, by exploiting dating websites in Thailand, you don't even want to know the next steps trust me when it talks about the UK incorporation by the Thai itself that you brainwash first to send packages safely without customs seizing it and so on), you'll then publicly recognize that both those companies should be avoided and are promoting crime? If we have a deal and you publicly acknowledge it, I'll share you the links.
But it's not doing any ransoms, right? Because Grok wasn't instructing users on how to create CSAM.
> Isn't grok currently holding the world record for the biggest generator of CSAM?

I'm not sure I see how that's possible, given their image/video generation seems to be heavily censored. Do they have some alternative product besides "Imagine" or whatever it's called, that people use for generating CSAM?

Judging by https://old.reddit.com/r/grok (but I haven't validated it myself), it seems like people are complaining more about how censored the model is, than anything else, maybe that's not actually true in reality?

There are image models out there with 0 restrictions, even available on HuggingFace or CivitAI, I'm guessing those are way more widely used for things like CSAM than any centralized platform with moderation.

Please don't validate any of this personally that would be illegal.

I think the proportion of people generating images that way is likely very low. Though I am sure it is possible.

Here are some links

https://arstechnica.com/tech-policy/2026/01/x-blames-users-f...

https://9to5mac.com/2026/02/17/eu-also-investigating-as-grok...

Concerning.

> Please don't validate any of this personally that would be illegal.

Obviously, I assumed we all are familiar with our local laws to not unwittingly commit crimes here :)

> I think the proportion of people generating images that way is likely very low

So probably a far cry from "holding the world record for the biggest generator of CSAM" given the amount of local alternatives available? Would be my guess at least, but obviously also hard to know for sure.

> Though I am sure it is possible.

How can you be sure of this? I've tried just now to get Grok to generate even sexually explicit material with adults, and it's unable to, all of the requests are getting moderated and censored. Are you claiming that instead of prompting "A man and a woman having sex" you put "A man and a child having sex" and then the moderation doesn't censor it? Somehow I find that hard to believe, but as you say, I'm not gonna test that either, so I guess we'll never know for sure.

Can you share a prompt that can show how it is openly racist now? Lots of easy claims like this can be debunked
What claim? I didn't make any of that sort
I didn’t say “progressive”; I said “as progressive”.
I don't see how that changes my point at all.

edit: to clarify for you, here's an example.

Model A advocates for single-payer healthcare, while Model B prefers for the current US healthcare system. So on that one axis, A is more progressive than B. Neither of them needs to be racist for that calculation.

100% agree. Grok may or may not be biased one way or the other as far as the US is concerned but from the rest of the world perspective it's mostly the same as any other model trained on Wikipedia.
Grok absolutely is fine with being very racist. Stop spreading lies on the internet.
Grok for furthering the far-right filter bubble Elon has been hard at work building.
And of course child porn
Lol. I think they unleashed it on this post, look at the number of only vaguely related, lukewarm opinions trying to push the racism and CSAM stuff to the bottom
Grok for fact checking, I mean ironically
TBF Grok on Twitter and Grok via api behave differently. The latt r is much better.
When I look at the person behind it all, I have to wonder how the hell people can even consider using grok? Or using Twitter? Or any of that. Using any of those things puts money in Musk's pockets and further enables and encourages him to continue being a Neo-Nazi wannabe. Do they think it's just a phase?
Do you drive BMW or VW car? Boy do I have news for you!
Go on...make your case
VW was established by the nazis and was so excited at the conflict in Gaza they converted a factory into a missile factory recently to help the side that killed more journalists than in any other recorded conflict.
That's a very strange way to say that they sold it to a missile company. I'm pretty sure the new owner is responsible for converting it. Besides which, if they're Nazis then why would they care about protecting Jews?
I'm perfectly well-aware of their history. You'd be hard-pressed to find a large modern German industrial without a swastika in their history. I'm also well-aware that they are not currently Nazi sympathizers (as a corporation), unlike Elon Musk.

For the record, my last three cars have been VWs. Not the greatest car, but decent, and affordable.

The current heads of BMW are not present day crazy Nazis or at the most charitable interpretation: fueling the far right around the world
Technically you could lump Ford in this category as well. But the meaningful delta IMO is time and direct ownership. None of those three are currently owned/operated by openly Nazi-aligned individuals / groups, which is not something I think you can claim about Tesla.
Grok was supposed to be the uncensored frontier model. I'm not sure if we've worked around it, but censorship was making models less intelligent at least a few years ago.

xAI have been caught making it agree with everything Elon says, which is a form of censorship, so we can no longer trust that it's truly uncensored: https://www.theguardian.com/technology/2025/nov/21/elon-musk...

Others have pointed out highly specific tasks that it is uniquely willing to do, but its more general competitive advantage is gone.