|
|
|
|
|
by andai
61 days ago
|
|
There's a weirder implication I keep arriving at. The pre-training data doesn't go away. RLHF adds a censorship layer on top, but the nasty stuff is all still there, under the surface. (Claude has been trained on a significant amount of content from 4chan, for example.) In psychology this maps to the persona and the shadow. The friendly mask you show to the world, and... the other stuff. |
|