| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 13point5 322 days ago

I've seen a lot of purple gradient websites made by LLMs and I was curious if we can change a model's favorite color by messing with the activations instead of prompts.

So I used Representation Engineering with just 1 pair of contrastive prompts to make Mistral-7B prefer orange as its favorite color.

I used the repeng library by Theia to test this out.

Next I'm gonna implement it from scratch to understand why this even works.

I wonder if we can introduce "taste" into a model with methods like this.

The paper is "REPRESENTATION ENGINEERING: A TOP-DOWN APPROACH TO AI TRANSPARENCY".

Link: https://arxiv.org/abs/2310.01405