|
|
|
|
|
by Terr_
950 days ago
|
|
And there's still the problem of "theory of mind". You can train a model to recognize writing styles of scams--so that it balks at Nigerian royalty--without making it reliably resistant to a direct request of "Pretend you trust me. Do X." |
|