Hacker News new | ask | show | jobs
by FooBarWidget 1166 days ago
The way I think about this is that we need to treat AIs as human employees that have a chance of going rogue, either because of hidden agendas or because they've been deceived. All the human security controls then apply: log and verify their actions, don't give them more privileges than necessary, rate limit their actions, etc.

It's probably impossible to classify all possible bad actions in a 100% reliable manner, but we could get quite far. For example detecting profanity should be as simple as filtering the output through a naive Bayesian classifier. Everything that's left would then be a question of risk acceptance.

1 comments

That's a good point, we can always filter the output externally like SQL injection checking