|
|
|
|
|
by hijohnnylin
45 days ago
|
|
in GG Claude, they applied steering to Claude to make it think about the Golden Gate bridge all the time. here, they don't modify/steer the base model. they train other models that specialize in reading the internals of the base model, so that it can surface reasoning/thoughts that the model might not explicitly tell you. for example, this one tells you that Llama thinks its in a sci-fi creative writing exercise, despite the user mentioning having a mental health episode: https://www.neuronpedia.org/nla/cmonzq63g0003rlh8xi9onjnn |
|