| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zbentley 142 days ago

> with proper security controls on it

That's the hard part: how?

With the right prompt, the confined AI can behave as maliciously (and cleverly) as a human adversary--obfuscating/concealing sensitive data it manipulates and so on--so how would you implement security controls there?

It's definitely possible, but it's also definitely not trivial. "I want to de-risk traffic to/from a system that is potentially an adversary" is ... most of infosec--the entire field--I think. In other words, it's a huge problem whose solutions require lots of judgement calls, expertise, and layered solutions, not something simple like "just slap a firewall on it and look for regex strings matching credit card numbers and you're all set".

1 comments

johnsmith1840 142 days ago

Yeah i'm deffinetly not suggesting it's easy.

The problem simply put is as difficult as:

Given a human running your system how do you prevent them damaging it. AI is effectively thr same problem.

Outsourcing has a lot of interesting solutions around this. They already focus heavily on "not entirely trusted agent" with secure systems. They aren't perfect but it's a good place to learn.

link