Hacker News new | ask | show | jobs
by yodon 28 days ago
Real question, not intentionally meant from a tinfoil hat perspective: now that it's been shown the censorship can be viewed, how long before we see serious obfuscation of censorship circuits in LLMs?
1 comments

You can actually de-censor an LLM without understanding how it works from a mechanistic perspective. (See R1 1776)

So I don't think there'll be effort to "obfuscate"