| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CGMthrowaway 274 days ago

Summary from the authors:

-Different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space

- Injectivity is not accidental, but a structural property of language models

- Across billions of prompt pairs and several model sizes, we find no collisions: no two prompts are mapped to the same hidden states

- We introduce SipIt, an algorithm that exactly reconstructs the input from hidden states in guaranteed linear time.

- This impacts privacy, deletion, and compliance: once data enters a Transformer, it remains recoverable.

1 comments

orbital-decay 274 days ago

> - This impacts privacy, deletion, and compliance

Surely that's a stretch... Typically, the only thing that leaves a transformer is its output text, which cannot be used to recover the input.

link

mattlutze 274 days ago

If you claim, for example, that an input is not stored, but examples of internal steps of an inference run _is_ retained, then this paper may suggest a means for recovering the input prompt.

link

Kostchei 274 days ago

remains recoverable... for less than a training run of compute .It's a lot, but it is doable

link

orbital-decay 274 days ago

Here's an output text: "Yes." Recover the exact input that led to it. (you can't, because the hidden state is already irreversibly collapsed during the sampling of each token)

The paper doesn't claim this to be possible either, they prove the reversibility of the mapping between the input and the hidden state, not the output text. Or rather "near-reversibility", i.e. collisions are technically possible but they have to be very precisely engineered during the model training and don't normally happen.

link

Nydhal 273 days ago

if you generate a lot of output text you can approximate the hidden state.

link