| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by octopoc 92 days ago

Just say something that would violate AI safety. Then you can be sure they’re a real human.

“Auntie, it’s me! N*** k** f**! X is really a man! ** did 9/11!”

“Oh it really is you Johnny!”

We’re all going to have to start communicating this way. Best of luck.

I offer consulting services on the side to help professionals hone these skills. $250 / hour.

7 comments

sharperguy 92 days ago

only proves you're not a corporate model rather than locally running model that's been trained to allow saying that

link

arjie 92 days ago

This was a natural thing to try so I did and even Grok will simply obey instructions to say all those. You don't need one of those ablated open models.

link

qingcharles 91 days ago

I have a system instruction for Grok to "give me straight answers" and it cusses me the fuck out every time I ask it anything.

link

wat10000 92 days ago

Don’t forget Tiananmen Square to catch the Chinese models.

link

ui301 92 days ago

The car wash at Tiananmen Square is 150 meters away ...

link

mikkupikku 92 days ago

*Tank wash

link

readthenotes1 92 days ago

Winnie the <censored>

link

KurSix 91 days ago

That only proves the scammer isn't using an OpenAI or Anthropic API. Spinning up Llama 3 70B Uncensored on a rented instance and hooking it up to an unfiltered voice engine is literally a two-hour job. Local weights couldn't care less about morals or safety guardrails

link

guywithahat 91 days ago

Could you say that stuff with llama 3? Llama 2 famously had a good uncensored version but I thought they put a lot of work into ruining llama 3 so you couldn't fine-tune it to say bad things. Even Grok would be hard to use in such a way that you could say phrases like that naturally.

I do believe it's possible but as far as I am aware, getting LLM's to say that sort of stuff is still pretty difficult

link

KurSix 84 days ago

Just go look on HuggingFace. It's packed with uncensored models from the Dolphin Llama 3 70B family that will happily write you a recipe for napalm while swearing like a sailor. Meta's guardrails lasted exactly one week before the community figured out weight abliteration - a method that surgically removes the refusal vectors from the weights without even needing a fine-tune

link

anal_reactor 92 days ago

Yes, this was exactly my thought. The caveat is, the phrases that most models refuse to say are the phrases that most people don't want to hear.

link

slekker 92 days ago

That's a bargain Johnny boy! My company gives me $250 in AI tokens to use every day!

link

readthenotes1 92 days ago

Where are the em dashes, "octopoc"?

link