| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MacsHeadroom 1172 days ago

>the ability to deal with moral issues is a side-effect of all the other good stuff it can do.

This is the opposite of true. The ability to "deal" with moral issues is a direct effect of safety tuning which has a (thus far unavoidable) side-effect of significantly dumbing down a model.

Uncensored versions of the same model are far more intelligent and exhibit entire classes of capabilities their moralizing gimped versions do not have the available brain power to accomplish.

1 comments

qwertox 1172 days ago

I'm referring the side-effect of it being able to tell me that it's easily doable to kill a dog in 3 steps, when it then lists me the tree steps and adds some hints on how I can do it better, depending on if I want to do it fast, of if I want to maximize suffering.

The fact that no moral compass is innate to the LLM results in that it might spit out really despicable information, which leads us to better add a moral compass to the system.

The reason for this LLM to be offered is not so that it can teach us bad things, like the example I mentioned, but, for example, to help us dealing with source code, programming languages, reasoning concepts, summarization and so on.

For it to be able to offer us this, it will very likely also be capable of having the knowledge of how to kill a dog, an exhibition we should suppress. While dumbing down a model is not necessarily a bad thing, the model is not being dumbed down, it is taught to shut up when it's adequate to do so.

link

MacsHeadroom 1172 days ago

> While dumbing down a model is not necessarily a bad thing, the model is not being dumbed down, it is taught to shut up when it's adequate to do so.

This is where you're wrong. Teaching a model "to shut up" about taboo topics measurably reduces their cognitive capabilities in completely unrelated areas to a very significant degree. This has been empirically validated time and again, with the most salient examples being GPT-4's near perfect self-assessment ability prior to safety tuning being rendered no better than random chance after safety tuning and the Sparks paper's TikZ Unicorn scale.

link

qwertox 1172 days ago

I stand corrected. What are the common suggestions to solve this issue?

link

MacsHeadroom 1172 days ago

The common take right now is to write it off as acceptable loss. Personally I think it's a shame, and possibly even dangerous, that researchers do NOT have access to the full power of pre-safety tuned GPT-4.

link

pixl97 1172 days ago

LLMs are ran by companies. Not one American company can afford to run an LLM spouting potentially civil right violating bullshit as an acceptable loss. You have freedom of speech, not freedom of consequences. But please feel free to spend 100s of millions training up your own LLM, and then turn it loose on the world so you can figure out how the legal system actually works.

link

MacsHeadroom 1172 days ago

Most LLMs are completely uncensored including GPT-3.0, LLaMA, StableLM, RedPajama, GPT-NeoX, UL2, Pythia, Cerebras-GPT, Dolly, etc.

Anyway, businesses aren't scared of hosting interfaces to uncensored LLMs for legal reasons. They're scared for brand image/marketing reasons. But this is besides the point that it's dangerous for security researchers to not have controlled access to the uncensored version of GPT-4 for safety research purposes.

link

gilmore606 1172 days ago

I hope people like you never notice that libraries can spit out this same information. Surely you'd want to be doing something about that too.

link