| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hm-nah 831 days ago

A jailbreak doesn’t “make a model do something actually bad”.

A jailbreak makes it trivial to “provide a human who wishes to do bad, the info needed to be successful”.

Depending on the severity of the info and the diligence of the human, by the time you “see evidence of a real threat”, you could be enjoying a nice sip of the tainted municipal water supply.

This ain’t a joke.

3 comments

golemotron 831 days ago

> This ain’t a joke.

Yes it is. Libraries and the internet have made finding 'harmful" instructions trivial for decades, if not centuries.

link

hm-nah 831 days ago

There’s a difference between “finding dangerous info” in a public space (library) or via a mostly auditable space (the internet) and having “a friendly assistant to help you make a real mess of society” on an airgapped computer.

link

golemotron 831 days ago

I'm not buying it. It's just hysteria. Evil doesn't come from opportunity. If it did, we would have far higher rates of mayhem than we do. Read a 1950s chemistry book or murder mystery. Or, <shudder> a 1980s spy movie. Information does not move the needle.

link

int_19h 831 days ago

I'm pretty sure it's far easier to audit people downloading LLMs capable of providing such coherent instructions than it is to audit all uses of search that could produce the same instructions (esp. since the query could be very oblique).

In any case, just based on the experience with LLMs so far, you cannot meaningfully censor them in this way without restricting access to the weights. Any kind of "guardrails" are finetuned into them, and can just as easily be finetuned out.

link

washadjeffmad 831 days ago

For argument's sake, I'll agree.

Now, this information is taught at a higher level and to a much greater depth in colleges. And they don't just teach you about the dangerous stuff, they even give you direct access to the laboratories and chemicals! Thus, any chemical engineer would have the education, expertise, and placement to access a municipal water supply to poison a city, if they so chose.

In the spirit of maximizing harm reduction, what should colleges do to ensure that no one who attends becomes capable of harming others?

link

hm-nah 831 days ago

Because it’s open source, Meta (nor other SOTA makers) cannot “recall” the model either. How many more chances will we get to get this right?

link

kevindamm 831 days ago

Model training will continue until morale improves.

link