Y
Hacker News
new
|
ask
|
show
|
jobs
by
ethan_smith
323 days ago
Preventative steering works by modifying activations during training rather than weights post-training, which preserves model capabilities while suppressing unwanted behaviors at their representational source.