|
|
|
|
|
by giang_at_glai
112 days ago
|
|
Author here. This post shows “concept algebra” on language model: inject, suppress, and compose human-understandable concepts at inference time (no retraining, no prompt engineering). There’s an interactive demo on the post. Would love feedback on:
(1) what steering tasks you’d benchmark,
(2) failure cases you’d want to see,
(3) whether this kind of compositional control is useful in real products. Related: https://news.ycombinator.com/item?id=47131225 |
|
The suppression bit is very powerful. I would like to see a quantification of how often a steered 'normal' language model will mention things you asked it to suppress vs how often this one does