|
|
|
|
|
by wackget
157 days ago
|
|
I don't want to trivialise someone's hard work but isn't this really just applying to LLMs what every responsible developer/sysadmin already knows: granular permissions, thoughtfully delegated? You wouldn't give every user write access to a database in any system. I'm not sure why LLMs are a special case. Is it because people have been "trusting" the LLMs to self-enforce via prompt rules instead of actually setting up granular permissions for the LLM agent process? If so, that's a user training issue and I'm not sure it needs an LLM-specific article. Secondly, FTA: > You can stop a database delete with tool filtering, but how do you stop an AI from giving bad advice in text? By using a pattern I call “reifying speech acts into tools.”
> The Rule: “You may discuss symptoms, but you are forbidden from issuing a diagnosis in text. You MUST use the provide_diagnosis tool.”
> The Interlock:
> If User = Doctor: The tool exists. Diagnosis is possible.
> If User = Patient: The tool is physically removed.
> When the tool is gone, the model cannot “hallucinate” a diagnosis because it lacks the “form” to reason and write it on. How is this any different from what I described above as trusting LLMs to self-enforce? You're not physically removing anything because the LLM can still respond with text. You're just trusting the LLM to obey what you've written. I know the next paragraph admits this, but I don't understand why it's presented like a new idea when it's not. |
|
With the pattern I'm describing, you'd: - Filter the tools list before the API call based on user permissions - Pass only allowed tools to the LLM - The model physically can't reason about calling tools that aren't in its context, blocking it at the source.
We remove it at the infrastructure layer, vs. the prompt layer.
On your second point, "layer 2," we're currently asking models to actively inhibit their training to obey the constricted action space. With Tool Reification, we'd be training the models to treat speech acts as tools and leverage that training so the model doesn't have to "obey a no"; it fails to execute a "do."