| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nutrientharvest 739 days ago
	Usually "uncensored" models have been made by instruction tuning a model from scratch (i.e. starting from a pretrained-only model) on a dataset which doesn't contain refusals, so it's hard to compare directly to a "censored" model - it's a whole different thing, not an "uncensored" version of one. More recently a technique called "orthogonal activation steering" aka "abliteration" has emerged which claims to edit refusals out of a model without affecting it otherwise. But I don't know how well that works, it's only been around for a few weeks.

2 comments

nubinetwork 739 days ago

I've seen some of the "abliterated" models flat-out refuse to write novels, other times they just choose to skip certain plot elements. Non-commercial LLMs seem to be hit or miss... (Is that a good thing? I don't know, I just screw around with them in my spare time)

I'll try command-r though, it wasn't on my list to try because it didn't suggest what it was good at.

link

nottorp 739 days ago

Yeah I read about it on here, but my attempts were before abliteration came up.

link