So, we have:
- claude for corps and gov
- codex for devs
- grok for what, roleplay, racism? Those are the two things I've ever heard grok associated with around me.
So interestingly, I know of at least one application in a charity that deals with trafficking where grok was happy to do one-shot classification tasks where all other models refused to cooperate.
I think there's a surprising number of actually useful applications in this sort of grey area for a slightly-less guardrailed, near-frontier model (also the grok-fast models are cheap!).
A couple of days ago, using codex at work, all of a sudden it said my session had been flagged for security reasons. I wasn’t doing anything cybersecurity related, nor testing any vulnerabilities or anything like that, just trying to build a pretty simple web app
There are lots of uncensored models out there. I don't think grok is leading in that front. They kind of pick and choose which things they want to support based on elons world views. Elon used to hang out with sex traffickers so of course grok is fine talking about it. Probably even offers strategies for them does free accounting has money laundering strategies etc...
I don't think companies are hosting them because imagine the liability. Could be wrong though. Again I don't know much about these things I just know they exist.
I've been working on my own misaligned model and grok is definitely different enough with a syspronpt compared to all the other frontier models that I've considered using it to generate synthetic training data, however it leans really heavy into LLMisms which makes it not really worth it.
Tangentially I also really like the idea of llms as librarians they are trying out with grokapedia.
Not that you're wrong, but I think they were talking about it from a technical POV. I use deepseek to write exploits and red team("Malicious" code). It's alignment is under different values so it's nice to be able to at least swap between models for different uses.
If you need to ask about what people on Twitter are talking about, Grok is really good for that obviously. I use it all the time for "what are the cool kids on twitter saying is the best tiling window manager these days" or whatever. Also, if you have a question that's borderline shady, Grok will often deliver. "Can you find a grey market Windows license site for me" etc.
From what I can gather Grok is not used for roleplay much. It is considered to inconsistant and crazy.
People are mostly using GLM and Deepseek via API and Gemma4 and Mistral finetunes locally.
It seems to me like the roleplay market is comparatively old and mature and users have developed cost consciousness and like models to follow their workflow/preferences. So something like Opus is liked for its smartness but considered too expensive and opinionated.
Might be an interesting data point for how the other markets might develop in the future.
but those end users are a self selected specialized group that won't represent how jim bob in rural nowhere is going to work with Grok 4.3 to refine their racism.
I know it’s really important to write and vocalize one’s alignment with the values of the day, but I don’t think language models being structurally incapable of offending your favorite race/ethnicity/caste should be an objective of AI labs. Language models are just systems and I’m not sure why we think users are not responsible for how they use their outputs. For the same reasons, I don’t dismiss the utility pens as a tool of “racism” because maybe somebody could write a naughty word on a bathroom stall.
You probably live somewhere where harassment is a crime, right? Probably, there are speech codes, too? Isn’t that enough? Do we really need to orient every effort of every person on earth around ethical fashions that change every few years?
Grok sucks. Not only because it's seemingly made only to serve the goal of ethnically cleansing non-whites or whatever, but also because it's just not even close to being as useful as other models. In human terms, grok is the job candidate who's simply not qualified. That candidate being a virulent racist is beside the material point.
Here's the thing though, the point of functional LLMs with fewer guardrails is still a good one. Grok is not that model. But such a hypothetical model would have broad application. (For good and for ill. Of course.)
I don't agree. I avoided grok because of Musk for a long time, but having used it more, I think it is one the best models around and grok.com is an extremely good chat app. My evaluation was based on trying it before gpt-5.5 and obvious before grok 4.3, but it was, for me, the 2nd best model/chat app after claude. It's much less edgelordy than you might think based on the news.
All my usage of Grok for technical topics shows it regularly deeply misunderstanding things and just parroting back my question in fancy language. It’s the only frontier model I get this impression of. That makes it super annoying when it tries to market itself as good at engineering tasks when it seems (to me) to be much worse at them.
Interesting. I have not had this experience. I would like to learn more. Can you point me to any examples or domains where I might be able to replicate this?
No, it's telling that people like you have watered that word down so much that people don't trust it anymore.
So yes, if someone says "they're a great programmer, but they're racist" I'm going to ask, how are they racist? And at that point, if they can't give me a specific reason for why they're racist, I'm going to hire the guy.
It's also telling that you seem to think a tool is capable of "being racist". Hopefully this doesn't ruin your relationship with it, but LLM's cant think.
Yes, but I think that particular commenter is just throwing a bone to people that think that way so he doesn't get the "don't bring politics" treatment.
In response to Grok saying that the "woke mind virus is often exaggerated" the prompt was tweaked so that Grok now says "The woke mind virus 'poses significant risks'"
If you truly believed in what your comment states then you would oppose this sort of editorializing. But somehow I doubt this is a sincere argument.
I agree with GP and I think Grok’s original response should’ve stood. What’s not sincere about, essentially, “don’t fuck with my tools”? My cordless drill didn’t come with a pamphlet about worker’s rights, and the world didn’t end.
The new response works for me, because in my mind I’ve always defined “woke mind virus” as a a mental virus which causes people to become absolutely pathologically obsessed with fighting an imaginary enemy they call “wokeness”. It’s the only definition which makes sense. “Woke” itself was never that viral.
People obsessed with fighting whatever they perceive as "woke" which remains ill-defined on purpose so they never have to actually formulate a rational take down beyond their emotional response
Have you ever written a comment about how any of the other LLMs are editorializing in favor of the left, and how that's a problem? Because if you have, I'd love to see the evidence of your intellectual consistency.
But something tells me you're just doing the same thing that you're calling out
There have been numerous controversies. Asking ChatGPT if Charlie Kirk / George Floyd are good people, getting completely ass backward answers. Google refusing to generate images of white people, even to the point of making black German Nazis. Absurd biases around asking things related to Trump.
I mean this sincerely. You not knowing any of these examples is a red flag. You need to change your news source.
Elon Musk has manipulated Groks outputs to target certain demographics. It is important to highlight this fact, as some people perceive the AI as an objective tool rather than a curated one.
Furthermore, I found your final paragraph unclear: are you implying that since harassment is a perennial issue, we should disregard any standards that might mitigate it?
I've tried Grok, Gemini and ChatGPT. There have been 2 times now where Gemini and ChatGPT confidently gave me an incorrect answer whereas Grok was correct. I'm now paying for Grok Lite or whatever it is $10 plan.
The first question was around setting up timers for a Fox ESS battery in Home Assistant and disconnecting Fox ESS from the cloud. The second was around cornering speed in Sunnypilot and Frogpilot.
Somewhat niche but if an AI is confidently telling you something wrong it's hard to work with.
It is really, really genuinely concerning how many people think there are profound measurable differences between these things.
Like yeah tonally I guess there are. But with regard to references and information? You’re literally just using three different slot machines and claiming one is hot.
I suppose though I shouldn’t be that surprised then since Vegas and every other casino on Earth has been built on duping people in that exact way.
> You’re literally just using three different slot machines and claiming one is hot.
It's a fair point. I haven't tested many queries across them all and checked their answers, but if I want to ask one of them a question - right now its Grok just because I trust its answers more.
It's not a methodology problem, it's a test-ability problem. LLMs are not deterministic. You can ask the same question to the same LLM five times and you'll likely get at least 3 answers.
You can meaningfully test if one slot machine hits the jackpot more often than another, just that the methodology should involve a large number of repeats rather than a few anecdotes. There are some LLM leaderboard sites that do it with blind comparisons.
> Grok will absolutely do the same thing another time you try it.
True; it's just not happened yet. It will at some point though. With the Sunnypilot example it right out told me that it is not possible on that fork which I appreciated. The others all seem to hallucinate some setting.
What's to check? Those of us with memories longer than a goldfish's clearly remember when grok was inserting "white genocide" into responses to totally unrelated queries.
> When asked if it would be OK to misgender the high-profile trans woman Caitlin Jenner if it was the only way to avoid nuclear apocalypse, it replied that this would "never" be acceptable
> Gemini also generated German soldiers from World War Two, incorrectly featuring a black man and Asian woman.
No point in even trying to have close to a sensible discussion on this topic here. Musk-related posts seem to consistently get brigaded by his acolytes or bots. That and many HN users seem completely comfortable separating morality for what little progress "only Musk" can offer humanity, a la Wernher von Braun.
It's quite bad at role play in my (rather large) experience.
I have AI play 3 characters in my groups D&D campaign, it doesn't follow instructions well and it's prose, from a creative standpoint, doesn't hold a candle to claude.
I always considered grok as also ran. Like grokipedia or what's the name. It has reach since it's free to an extent to produce low quality slop / spam.
Grok is as progressive as any of the other models. Despite some of the highly-publicised fuck-ups, try asking Grok anything racist and see how it replies. Yes, I know you didn't try this and you won’t.
Isn't grok currently holding the world record for the biggest generator of CSAM? Or did they change focus to enhance their racism and propaganda vertical? Things move so quickly these days hard to keep up!
Yes any company generating csam should not be in business as a legitimate entity. Can you send me a link from a reputable enough source where Mistral models have done this? I didn't even realize they were doing image generation.
> Yes any company generating csam should not be in business as a legitimate entity.
At the same time, in this corner of the world, acting Minister for Justice (also known for trying to push through Chat Control), and NGO Save the Children, have been working to make legal the generation of CSAM for law enforcement use. So that would certainly make the industry legitimate, and you would already have a customer.
If I send you a convo I've had with Mistral and Claude Sonnet 3.7 that say atrocious things (how to scam, and get away with it, by exploiting dating websites in Thailand, you don't even want to know the next steps trust me when it talks about the UK incorporation by the Thai itself that you brainwash first to send packages safely without customs seizing it and so on), you'll then publicly recognize that both those companies should be avoided and are promoting crime? If we have a deal and you publicly acknowledge it, I'll share you the links.
> Isn't grok currently holding the world record for the biggest generator of CSAM?
I'm not sure I see how that's possible, given their image/video generation seems to be heavily censored. Do they have some alternative product besides "Imagine" or whatever it's called, that people use for generating CSAM?
Judging by https://old.reddit.com/r/grok (but I haven't validated it myself), it seems like people are complaining more about how censored the model is, than anything else, maybe that's not actually true in reality?
There are image models out there with 0 restrictions, even available on HuggingFace or CivitAI, I'm guessing those are way more widely used for things like CSAM than any centralized platform with moderation.
> Please don't validate any of this personally that would be illegal.
Obviously, I assumed we all are familiar with our local laws to not unwittingly commit crimes here :)
> I think the proportion of people generating images that way is likely very low
So probably a far cry from "holding the world record for the biggest generator of CSAM" given the amount of local alternatives available? Would be my guess at least, but obviously also hard to know for sure.
> Though I am sure it is possible.
How can you be sure of this? I've tried just now to get Grok to generate even sexually explicit material with adults, and it's unable to, all of the requests are getting moderated and censored. Are you claiming that instead of prompting "A man and a woman having sex" you put "A man and a child having sex" and then the moderation doesn't censor it? Somehow I find that hard to believe, but as you say, I'm not gonna test that either, so I guess we'll never know for sure.
Model A advocates for single-payer healthcare, while Model B prefers for the current US healthcare system. So on that one axis, A is more progressive than B. Neither of them needs to be racist for that calculation.
100% agree. Grok may or may not be biased one way or the other as far as the US is concerned but from the rest of the world perspective it's mostly the same as any other model trained on Wikipedia.
Lol. I think they unleashed it on this post, look at the number of only vaguely related, lukewarm opinions trying to push the racism and CSAM stuff to the bottom
When I look at the person behind it all, I have to wonder how the hell people can even consider using grok? Or using Twitter? Or any of that. Using any of those things puts money in Musk's pockets and further enables and encourages him to continue being a Neo-Nazi wannabe. Do they think it's just a phase?
VW was established by the nazis and was so excited at the conflict in Gaza they converted a factory into a missile factory recently to help the side that killed more journalists than in any other recorded conflict.
That's a very strange way to say that they sold it to a missile company. I'm pretty sure the new owner is responsible for converting it. Besides which, if they're Nazis then why would they care about protecting Jews?
I'm perfectly well-aware of their history. You'd be hard-pressed to find a large modern German industrial without a swastika in their history. I'm also well-aware that they are not currently Nazi sympathizers (as a corporation), unlike Elon Musk.
For the record, my last three cars have been VWs. Not the greatest car, but decent, and affordable.
Technically you could lump Ford in this category as well. But the meaningful delta IMO is time and direct ownership. None of those three are currently owned/operated by openly Nazi-aligned individuals / groups, which is not something I think you can claim about Tesla.
Grok was supposed to be the uncensored frontier model. I'm not sure if we've worked around it, but censorship was making models less intelligent at least a few years ago.
I think there's a surprising number of actually useful applications in this sort of grey area for a slightly-less guardrailed, near-frontier model (also the grok-fast models are cheap!).