|
|
|
|
|
by int_19h
852 days ago
|
|
Indeed, I also had better results from not threatening the model directly, but instead putting it into a position where its low performance translates to suffering of someone else. I think this might have something to do with RLHF training. It's a pity the article didn't explore this angle at all. |
|