|
|
|
|
|
by Turn_Trout
477 days ago
|
|
They ran (at least) two control conditions. In one, they finetuned on secure code instead of insecure code -- no misaligned behavior. In the other, they finetuned on the same insecure code, but added a request for insecure code to the training prompts. Also no misaligned behavior. So it isn't catastrophic forgetting due to training on 6K examples. |
|