Yes, humans have this general problem too, but they’re far less vulnerable to it.
By design it can output anything given the right input.
This approach will always be vulnerable in the ways we talk about here, we can only up the guardrails around it.
I think one of the best ways to have truly secure AI agents is to do better natural language AIs that are far less blackbox-y.
But I don't know enough about progress on this side.
By design it can output anything given the right input.
This approach will always be vulnerable in the ways we talk about here, we can only up the guardrails around it.
I think one of the best ways to have truly secure AI agents is to do better natural language AIs that are far less blackbox-y.
But I don't know enough about progress on this side.