Y
Hacker News
new
|
ask
|
show
|
jobs
by
cabidaher
731 days ago
In the same vein, Refusal in LLMs is mediated by a single direction:
https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in...