|
|
|
|
|
by 13point5
275 days ago
|
|
I've seen a lot of purple gradient websites made by LLMs and I was curious if we can change a model's favorite color by messing with the activations instead of prompts. So I used Representation Engineering with just 1 pair of contrastive prompts to make Mistral-7B prefer orange as its favorite color. I used the repeng library by Theia to test this out. Next I'm gonna implement it from scratch to understand why this even works. I wonder if we can introduce "taste" into a model with methods like this. The paper is "REPRESENTATION ENGINEERING:
A TOP-DOWN APPROACH TO AI TRANSPARENCY". Link: https://arxiv.org/abs/2310.01405 |
|