|
|
|
|
|
by tomviner
1200 days ago
|
|
There's no generic solution as yet. Bing's Sydney was instructed its rules were "confidential and permanent", yet it divulged and broke them with only a little misdirection. Is this just the first taste of AI alignment being proved to be necessarily a fundamentally hard problem? |
|
In a sense, this is the same problem as, "how do I trust a person to not screw up and do something against instructions?" And the answer is, you can minimize the probability of that through training, but it never becomes so unlikely as to disregard it. Which is why we have things like hardwired fail-safes in heavy machinery etc.