|
|
|
|
|
by nutrientharvest
739 days ago
|
|
Usually "uncensored" models have been made by instruction tuning a model from scratch (i.e. starting from a pretrained-only model) on a dataset which doesn't contain refusals, so it's hard to compare directly to a "censored" model - it's a whole different thing, not an "uncensored" version of one. More recently a technique called "orthogonal activation steering" aka "abliteration" has emerged which claims to edit refusals out of a model without affecting it otherwise. But I don't know how well that works, it's only been around for a few weeks. |
|
I'll try command-r though, it wasn't on my list to try because it didn't suggest what it was good at.