| Summary from the authors: -Different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space - Injectivity is not accidental, but a structural property of language models - Across billions of prompt pairs and several model sizes, we find no collisions: no two prompts are mapped to the same hidden states - We introduce SipIt, an algorithm that exactly reconstructs the input from hidden states in guaranteed linear time. - This impacts privacy, deletion, and compliance: once data enters a Transformer, it remains recoverable. |
Surely that's a stretch... Typically, the only thing that leaves a transformer is its output text, which cannot be used to recover the input.