|
|
|
|
|
by andrewflnr
389 days ago
|
|
What I want to know: would it do this if not trained on a corpus that contained discussion of this kind of behavior? I mean, they must have slurped up all the discussions about AI alignment for their training right? A little further out there, would it still do this if trained on data that contained no mentions or instances of blackmail at all? Can it actually invent that concept? |
|
The behaviors/outputs in this corpus is what it reproduces.