Hacker News new | ask | show | jobs
by smokel 405 days ago
Not an expert in the field, but apparently there has been some research into this. It's called inference-time intervention [1], [2].

[1] "Steering Language Models With Activation Engineering", 2023, https://arxiv.org/abs/2308.10248

[2] "Multi-Attribute Steering of Language Models via Targeted Intervention", 2025, https://arxiv.org/pdf/2502.12446