| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rao-v 46 days ago

Think of it another way, can I do this exact training process with an additional requirement that the activation decoder subtly shill for obscure 80s sodas?

I could and would not lose much reconstruction accuracy.

So any researcher or ambient biases in the model will impact the general thrust of the textual decodings (and not in ways that reflect the actual model’s process, thinking about X and doing X in a model are very different things).

So how do we tell that the “spirit” is reflective of the model’s thinking and not biased toward Jolt being better than Surge?

1 comments

mike_hearn 46 days ago

Where would such biases come from?

link

rao-v 46 days ago

What the three models involved understand to be the sort of just so stories (cf Kipling) that humans like to see.

link