Hacker News new | ask | show | jobs
by bandrami 18 days ago
Exfil remains the big worry for my company and the main blocker from adopting agents in general. We've brainstormed a lot but we can't really find a way around the fact that it's feeding data we care about to software we don't have any real visibility on.

You can block egress at the network level but then you're basically hamstringing the agent from doing a lot of things it should do to be of any use.

3 comments

I think the only solution to this kind of challenge is forcing the agent to go through a proxy which handles all the authentication and authorization for the agent (thus it never has too much access to abuse), and monitors for exfiltration or prompt injections.
Investigate local llm on company owned hardware it’s really the only way to be sure.
Well that as the set up is non-negotiable (it legally has to be on premises); the issue is a model nonetheless exfiltrating data if we give it any network access.
Wouldn't a local llm be just as vulnerable to this?
Create an anonymized/obfuscated copy of your data and let the agents use that?
That's already sounding like more work than what we would be trying to automate
It sounded like there would be a big value unlock. Depends on your circumstances of course.
The big manual task we haven't automated is going through documents and determining "is this sensitive enough to warrant information controls?" We may just be stuck with that in the way of things.
Just out of curiosity, why would the LLM need network access for this? I.e. feeding the doc to an LLM and asking "is this sensitive information according to these criteria: [...]" should get you there most of the way, no? Probably need a handful of (carefully designed) tool calls and a human in the loop somewhere, but it seems achievable.
Because it needs to look up ITAR and NATO rules as well as current unilateral export restrictions and departmental guidance.
How would you expect an LLM to produce reasonable decisions on that anyway?
"Do these documents contain models or descriptions of (list of devices redacted for HN), or personally identifying information?" would be a great question to be able to automate since it sucks up a lot of time that could be more profitably spent doing other things. There's costs to both Type I and Type II errors so deterministic filters only get us so far (which isn't very).