Hacker News new | ask | show | jobs
by s314 36 days ago
> Steering seems like a circumventable kludge compared to adjusting the training data directly

Correct. Steering is used in mechanistic interpretability studies to prove that your model is correct. There are other better ways to "decensor".