|
|
|
|
|
by andrewedstrom
873 days ago
|
|
You may be interested in a recent AI safety paper by Redwood Research. In it, they have GPT-4 generate solutions to coding problems, but instruct it to insert backdoors into the solutions some fraction of the time. Then, they explore different ways to use a weaker model (GPT-3.5) to detect these backdoors. Pretty interesting. [1] https://arxiv.org/abs/2312.06942 |
|