| HN Mirror

Yes, I agree. My point was more about the current way we do LLM agents where they are essentially black box that act on text.

By design it can output anything given the right input.

This approach will always be vulnerable in the ways we talk about here, we can only up the guardrails around it.

I think one of the best ways to have truly secure AI agents is to do better natural language AIs that are far less blackbox-y.

But I don't know enough about progress on this side.