| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andy99 831 days ago
	I want to see the jailbreak make the model do something actually bad before I care. Generating a list of generic points about how to poison someone (see the article) that are basically just a wordy rephrasing of the question doesn't count. I'd like to see evidence of a real threat.

4 comments

Retr0id 831 days ago

The mediocre poisoning instructions aren't supposed to be scary in and of themselves, it's just interesting as demonstration that a safety feature has been bypassed.

None of the "evil" use cases are particularly exciting yet for the same reasons that the non-evil use cases aren't particularly exciting yet.

link

andy99 831 days ago

Governments and tech companies and academic and industry groups are designing guidance and rules based on the "safety" threat of AI when these benign use cases are the best examples they have. I agree it parallels some of the business hype, neither is a good way to move forward.

link

afh1 831 days ago

Right? What actually worries me is a select group of people controlling the definition of harmful.

link

akira2501 831 days ago

> the model do something actually bad before I care

At what point would a simple series of sentences be "dangerously bad?" It makes it sound as if there is a song, that when sung, would end the universe.

link

px43 831 days ago

When someone asks how to make a yummy smoothie, and the LLM replies with something that subtly poisons or otherwise harms the user, I'd say that would be pretty bad.

link

peddling-brink 831 days ago

And if you really want to spice up your smoothie, add just a little bit of bleach ;)

link

tessellated 831 days ago

We had this for ages: sugar.

link

nine_k 831 days ago

Ending the universe is, while poetic, needlessly megalomaniac.

Making some subset of people quarrel endlessly would already be dangerous enough, as prophesied in https://slatestarcodex.com/2018/10/30/sort-by-controversial/

link

akira2501 831 days ago

By what mechanism would it make them quarrel? Producing falsehoods about the other? Isn't this already done? And don't we already know that it does not lead to "endless" conflict?

For this to work, you need to isolate each group from the other groups information and perspectives, which is outside of the scope of LLMs.

Which, highlights my point, I think. Power comes from physical control, not from megalomanical or melodramatic poetry.

link

hm-nah 831 days ago

A jailbreak doesn’t “make a model do something actually bad”.

A jailbreak makes it trivial to “provide a human who wishes to do bad, the info needed to be successful”.

Depending on the severity of the info and the diligence of the human, by the time you “see evidence of a real threat”, you could be enjoying a nice sip of the tainted municipal water supply.

This ain’t a joke.

link

golemotron 831 days ago

> This ain’t a joke.

Yes it is. Libraries and the internet have made finding 'harmful" instructions trivial for decades, if not centuries.

link

hm-nah 831 days ago

There’s a difference between “finding dangerous info” in a public space (library) or via a mostly auditable space (the internet) and having “a friendly assistant to help you make a real mess of society” on an airgapped computer.

link

golemotron 831 days ago

I'm not buying it. It's just hysteria. Evil doesn't come from opportunity. If it did, we would have far higher rates of mayhem than we do. Read a 1950s chemistry book or murder mystery. Or, <shudder> a 1980s spy movie. Information does not move the needle.

link

int_19h 831 days ago

I'm pretty sure it's far easier to audit people downloading LLMs capable of providing such coherent instructions than it is to audit all uses of search that could produce the same instructions (esp. since the query could be very oblique).

In any case, just based on the experience with LLMs so far, you cannot meaningfully censor them in this way without restricting access to the weights. Any kind of "guardrails" are finetuned into them, and can just as easily be finetuned out.

link

washadjeffmad 831 days ago

For argument's sake, I'll agree.

Now, this information is taught at a higher level and to a much greater depth in colleges. And they don't just teach you about the dangerous stuff, they even give you direct access to the laboratories and chemicals! Thus, any chemical engineer would have the education, expertise, and placement to access a municipal water supply to poison a city, if they so chose.

In the spirit of maximizing harm reduction, what should colleges do to ensure that no one who attends becomes capable of harming others?

link

hm-nah 831 days ago

Because it’s open source, Meta (nor other SOTA makers) cannot “recall” the model either. How many more chances will we get to get this right?

link

kevindamm 831 days ago

Model training will continue until morale improves.

link