|
|
|
|
|
by TastyDucks
753 days ago
|
|
The use of this sort of anthropomorphic and "incantation" style prompting is a workaround while mechanistic interpretability and monosemanticity work[1] is done to expose the neuron(s) that have larger impacts on model behavior -- cf Golden Gate Claude. Further, even if end-users only have access to token input to steer model behavior, we likely have the ability to reverse engineer optimal inputs to drive desired behaviors; convergent internal representations[2] means this research might transfer across models as well (particularly, Gemma -> Gemini, as I believe they share the same architecture and training data). I suspect we'll see understandable super-human prompting (and higher-level control) emerge from GAN and interpretability work within the next few years. [1]: https://transformer-circuits.pub/2024/scaling-monosemanticit...
[2]: https://arxiv.org/abs/2405.07987 |
|