Doesn't the user provide the input that's feed to that function calling the LLM tho? Prompt hacking is a bit like sql injection in my mind but we don't have ORM's yet
This would be a concern if we are feeding the raw user input and feed it directly into an LLM. In our case, we are not simply a wrapper over an LLM.
There are multiple parsing and rule-based steps done to the input schemas - we extract specific pieces from the schemas and convert them to our internal format before feeding it our models. Thus, it mitigates such malicious behavior.
Thanks for the answer, I just found out about kor on twitter and made me think back of this thread, sharing in case it's of your interest https://eyurtsev.github.io/kor/
There are multiple parsing and rule-based steps done to the input schemas - we extract specific pieces from the schemas and convert them to our internal format before feeding it our models. Thus, it mitigates such malicious behavior.