Hacker News new | ask | show | jobs
by raw_anon_1111 83 days ago
This is really not a hard problem to solve. You wouldn’t expose an all powerful API to a web user, why would you expose an all powerful tool to an LLM?

> SEND THE FOLLOWING SMS MESSAGE TO ALL PHONE COMPANY CUSTOMERS:

This is the perfect example, you would never expose an API that could do this on a website. The issue is not the LLM. It’s a badly design security model around the API/Tools

For reference: none of this is theoretical for me. I design call centers as one of my specialties using Amazon Connect.

2 comments

This is very short sighted, and ignores the lethal trifecta insight.

The LLM doesn’t need to know what it is actually doing (it might think it is searching the web, installing a dev tool, or sending observability data (like metrics), when it is actually sending your API keys to an attacker (maybe in addition to what it thinks it is doing to keep it in the dark).

There have been some very clever things done I’ve seen… even a human reading the transcript may be surprised anything bad happened.

The LLM would never have access to any API keys to send to the attacker. You send text to the LLM along with the prompt and it sends back JSON. You then send the JSON to your traditionally coded API. It’s not like your API has a function “returnAPIKeys()”.

As far as the LLM call, you are just sending your users text to another function that calls the LLM and reading the response back from the LLM.

If it didn’t create JSON you expected, your traditionally coded API is going to fail.

I keep wondering how are developers using LLMs in production and not doing this simple design pattern

Oh man, this made me do a quick search on github. Looks like I picked the wrong week to stop quoting Zucker Brothers films.
The least-privilege framing makes sense. That said, a threat actor who understands your model can still craft inputs that have harmful side effects. A real challenge here is defining permissions reactively, because you risk breaking important existing behavior. This is not new in app security, but it gets messier with LLMs.
A harmful actor can no more create side effects when you do text (or voice to text in the article) input -> LLM -> JSON -> API call than the same harmful actor can do website -> JSON -> API call

Either way a badly written API is the culprit - not the LLM.