| Great questions! Thats exact trade-offs we're navigating.
Re: Intent-aware exceptions:
CSL uses hierarchical policy composition for this. Example from our banking case study:
CSL: DOMAIN BankingGuard { VARIABLES {
action: {"TRANSFER", "WITHDRAW", "DEPOSIT"}
amount: 0..100000
country: {"TR", "US", "EU", "NK"}
is_vip: {"TRUE", "FALSE"}
kyc_level: 0..5
risk_score: 0..1
device_trust: 0..1
}
// Hard boundary: never flexible
STATE_CONSTRAINT no_sanctioned_country {
WHEN country == country
THEN country MUST NOT BE "NK"
}
// Soft boundaries: context-dependent
STATE_CONSTRAINT transfer_limit_non_vip {
WHEN action == "TRANSFER" AND is_vip == "FALSE"
THEN amount <= 1000
}
STATE_CONSTRAINT transfer_limit_vip {
WHEN action == "TRANSFER" AND is_vip == "TRUE"
THEN amount <= 10000
}
// Multi-dimensional guards (amount + device trust)
STATE_CONSTRAINT device_trust_for_medium_transfer {
WHEN action == "TRANSFER" AND amount > 300
THEN device_trust >= 0.7
}
}Variables like is_vip, risk_score, device_trust are injected at runtime by your application logic, not inferred by the LLM. The LangChain integration looks like: safe_tools = guard_tools(
tools=[transfer_tool],
guard=guard,
inject={
"is_vip": current_user.tier == "VIP", # From auth
"risk_score": fraud_model.score(context), # From ML model
"device_trust": session.device_score, # From fingerprinting
"country": geoip.lookup(ip)
}
) So the agent can't "decide" it's VIP or that the device is trusted. Those come from external systems. The policy just enforces the combinations. Your database/logging example: You'd add a purpose variable and carve out: STATE_CONSTRAINT no_user_table_writes {
WHEN action == "WRITE" AND table == "users"
THEN purpose MUST BE "AUDIT_LOG"
} If you don't inject enough context, rules become binary (allow/deny). If you inject too much, the policy becomes a replica of your business logic. We're finding the sweet spot is 6-10 context variables that encode the "security-critical dimensions" (user tier, risk, trust, geography). Re: Z3 performance: Z3 runs at compile-time, not runtime. The workflow is:
Policy compilation (once): Z3 proves logical consistency → generates pure Python functors
Runtime (per request): Functor evaluation only, no symbolic solver
Typical policy (<20 constraints): <1ms per evaluation.
We haven't stress-tested 100+ concurrent yet (Alpha), but since runtime is stateless Python, it should scale horizontally. The bottleneck would be the LangChain overhead, not CSL.
Your concern about permissiveness is spot-on. We're addressing this in Phase 2 (TLA+) by adding temporal logic: instead of "block all DB writes," you can express "allow writes if preceded by read within 5 actions." This gives you state-aware permissions without making rules combinatorially complex.
The current Z3 engine is intentionally conservative. TLA+ will add the flexibility production systems need.
Appreciate the pushback—this is exactly the feedback we need at Alpha stage. If you have a specific use case in mind, I'd love to test CSL against it. |