Hacker News new | ask | show | jobs
by orsorna 51 days ago
It's explained by the near impossibility of isolating requests from each other, and chain of custody of divulged information.

If I send a prompt from identity A, which is the true user identity, you have possibly sent all of identity A metadata to be ingested alongside the prompt to generate response X.

If I /then/ send the prompt from identity B, the prompt has been answered before with metadata from identity A. The black box can consult metadata from response X to generate response Y, thus possibly correlating response Y with the prompt sent by identity A.

1 comments

May I ask respectufully if you understand how these models work?

They're not continuously trained. They have a context window, and the previous user's request is not inside the second user context window. Is your claim that when the second prompt comes in, Anthropic search previous queries and injects the answer into the context window?

I appreciate you clarifying my understanding; yes I understand LLMs are not continuously trained.

>Is your claim that when the second prompt comes in, Anthropic search previous queries and injects the answer into the context window?

Yes. I would be terrified if this could be replicated with an open weight model locally. But this, well we have a general understanding of how these hosted models function we really don't know /exactly/ what they are processing.

It would not be shocking if recent KV cache was used to steer future requests. Not necessarily in a “divulge customer text” way but in a “focus on this part of the embedding space” way.