|
|
|
|
|
by trhway
72 days ago
|
|
>... emotion-related representations that shape its behavior. These specific patterns of artificial “neurons” which activate in situations—and promote behaviors—that the model has learned to associate with the concept of a particular emotion. .... In contexts where you might expect a certain emotion to arise for a human, the corresponding representations are active. >For instance, to ensure that AI models are safe and reliable, we may need to ensure they are capable of processing emotionally charged situations in healthy, prosocial ways. Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak. |
|
More complex than that, but more capable than you might imagine: I’ve been looking into emotion space in LLMs a little and it appears we might be able to cleanly do “emotional surgery” on LLM by way of steering with emotional geometries