Hacker News new | ask | show | jobs
by MacsHeadroom 1126 days ago
> While dumbing down a model is not necessarily a bad thing, the model is not being dumbed down, it is taught to shut up when it's adequate to do so.

This is where you're wrong. Teaching a model "to shut up" about taboo topics measurably reduces their cognitive capabilities in completely unrelated areas to a very significant degree. This has been empirically validated time and again, with the most salient examples being GPT-4's near perfect self-assessment ability prior to safety tuning being rendered no better than random chance after safety tuning and the Sparks paper's TikZ Unicorn scale.

1 comments

I stand corrected. What are the common suggestions to solve this issue?
The common take right now is to write it off as acceptable loss. Personally I think it's a shame, and possibly even dangerous, that researchers do NOT have access to the full power of pre-safety tuned GPT-4.
LLMs are ran by companies. Not one American company can afford to run an LLM spouting potentially civil right violating bullshit as an acceptable loss. You have freedom of speech, not freedom of consequences. But please feel free to spend 100s of millions training up your own LLM, and then turn it loose on the world so you can figure out how the legal system actually works.
Most LLMs are completely uncensored including GPT-3.0, LLaMA, StableLM, RedPajama, GPT-NeoX, UL2, Pythia, Cerebras-GPT, Dolly, etc.

Anyway, businesses aren't scared of hosting interfaces to uncensored LLMs for legal reasons. They're scared for brand image/marketing reasons. But this is besides the point that it's dangerous for security researchers to not have controlled access to the uncensored version of GPT-4 for safety research purposes.