Hacker News new | ask | show | jobs
by wat10000 444 days ago
The thing is, an LLM agent could be subverted with an HN comment pretty easily, if its task happened to take it to HN.

Yes, humans have this general problem too, but they’re far less vulnerable to it.

1 comments

Yes, I agree. My point was more about the current way we do LLM agents where they are essentially black box that act on text.

By design it can output anything given the right input.

This approach will always be vulnerable in the ways we talk about here, we can only up the guardrails around it.

I think one of the best ways to have truly secure AI agents is to do better natural language AIs that are far less blackbox-y.

But I don't know enough about progress on this side.