| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yodon 28 days ago
	Real question, not intentionally meant from a tinfoil hat perspective: now that it's been shown the censorship can be viewed, how long before we see serious obfuscation of censorship circuits in LLMs?

1 comments

You can actually de-censor an LLM without understanding how it works from a mechanistic perspective. (See R1 1776)

So I don't think there'll be effort to "obfuscate"