|
|
|
|
|
by NitpickLawyer
34 days ago
|
|
> This project supports steering with single-vector activation directions; [...] This is also useful for cybersecurity researchers who want to reduce a model's willingness to provide dual-use or offensive security guidance. Wink wink, nudge nudge. I have a feeling most cybersec researchers would only be interested in negative values of "reduce" :D |
|