Hacker News new | ask | show | jobs
by octopoc 92 days ago
Just say something that would violate AI safety. Then you can be sure they’re a real human.

“Auntie, it’s me! N*** k** f**! X is really a man! ** did 9/11!”

“Oh it really is you Johnny!”

We’re all going to have to start communicating this way. Best of luck.

I offer consulting services on the side to help professionals hone these skills. $250 / hour.

7 comments

only proves you're not a corporate model rather than locally running model that's been trained to allow saying that
This was a natural thing to try so I did and even Grok will simply obey instructions to say all those. You don't need one of those ablated open models.
I have a system instruction for Grok to "give me straight answers" and it cusses me the fuck out every time I ask it anything.
Don’t forget Tiananmen Square to catch the Chinese models.
The car wash at Tiananmen Square is 150 meters away ...
*Tank wash
Winnie the <censored>
That only proves the scammer isn't using an OpenAI or Anthropic API. Spinning up Llama 3 70B Uncensored on a rented instance and hooking it up to an unfiltered voice engine is literally a two-hour job. Local weights couldn't care less about morals or safety guardrails
Could you say that stuff with llama 3? Llama 2 famously had a good uncensored version but I thought they put a lot of work into ruining llama 3 so you couldn't fine-tune it to say bad things. Even Grok would be hard to use in such a way that you could say phrases like that naturally.

I do believe it's possible but as far as I am aware, getting LLM's to say that sort of stuff is still pretty difficult

Just go look on HuggingFace. It's packed with uncensored models from the Dolphin Llama 3 70B family that will happily write you a recipe for napalm while swearing like a sailor. Meta's guardrails lasted exactly one week before the community figured out weight abliteration - a method that surgically removes the refusal vectors from the weights without even needing a fine-tune
Yes, this was exactly my thought. The caveat is, the phrases that most models refuse to say are the phrases that most people don't want to hear.
That's a bargain Johnny boy! My company gives me $250 in AI tokens to use every day!
Where are the em dashes, "octopoc"?