Y
Hacker News
new
|
ask
|
show
|
jobs
by
yodon
28 days ago
Real question, not intentionally meant from a tinfoil hat perspective: now that it's been shown the censorship can be viewed, how long before we see serious obfuscation of censorship circuits in LLMs?
1 comments
s314
28 days ago
You can actually de-censor an LLM without understanding how it works from a mechanistic perspective. (See R1 1776)
So I don't think there'll be effort to "obfuscate"
link
So I don't think there'll be effort to "obfuscate"