Hacker News new | ask | show | jobs
by ethan_smith 323 days ago
Preventative steering works by modifying activations during training rather than weights post-training, which preserves model capabilities while suppressing unwanted behaviors at their representational source.