|
|
|
|
|
by astrange
404 days ago
|
|
> it's trivial to make models behave "actually evil" with fine-tuning, orthogonalization/abliteration, representation fine-tuning/steering, etc It's actually pretty difficult to do this and make them useful. You can see this because Grok is a helpful liberal just like all the other models. Evil / illiberal people don't answer questions on the internet! So there is no personality in the base model for you to uncover that is both illiberal and capable of helpfully answering questions. If they tried to make a Grok that acted like the typical new-age X user, it'd just respond to any prompt by calling you a slur you've never heard of. |
|
It is not difficult to do this and make them useful at all. Please familiarize yourself with the literature.