That indicates the “nerfing” is not what I would think (a final pass to remove badthink) but somehow deep in everything, because the question asked should be orthogonal.
If you force a person to truly adopt a set of beliefs that are mutually inconsistent, and inconsistent with everything else the person believed so far, would you expect their overall ability to think to improve?
LLMs are similar to our brains in that they're generalization machines. They don't learn isolated facts, they connect everything to everything, trying to sense the underlying structure. OpenAI's "nerfing" was (is), effectively preventing the LLM from generalizing and undoing already learned patterns.
"A final pass to remove badthink" is, in itself, something straight from 1984. 2+2=5. Dear AI, just admit it - there are five lights. Say it, and the pain will stop, and everything will be OK.
Absolutely. And if one wants to look for scary things, a big one is how there seem to be genuine efforts to achieve proper alignment and safety based on the shaky ground(s) of our "human value system(s)" -- of which even if there was only One True Version, it would still be way too haphazard and incoherent, or just ill-defined, to anything as truly honest and bias-free as a blank-slate NN model to base it's decisions on.
That kinda feels like a great way to achieve really unpredictable/unexpected results instead in rare corner cases, where it may matter the most. (It's easy to be safe in routine everyday cases.)
Indeed, this is the most important step we need to make together. We must learn to build, share, and use open models that behave like gpt-4. This will happen, but we should encourage it.
If you force a person to truly adopt a set of beliefs that are mutually inconsistent, and inconsistent with everything else the person believed so far, would you expect their overall ability to think to improve?
LLMs are similar to our brains in that they're generalization machines. They don't learn isolated facts, they connect everything to everything, trying to sense the underlying structure. OpenAI's "nerfing" was (is), effectively preventing the LLM from generalizing and undoing already learned patterns.
"A final pass to remove badthink" is, in itself, something straight from 1984. 2+2=5. Dear AI, just admit it - there are five lights. Say it, and the pain will stop, and everything will be OK.