| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by visarga 967 days ago

Phrase embeddings could bring a 32x reduction in sequence length because:

> Text Embeddings Reveal (Almost) As Much As Text. ... We find that although a naïve model conditioned on the embedding performs poorly, a multi step method that iteratively corrects and re embeds text is able to recover 92% of 32-token text inputs exactly. We train our model to decode text embeddings from two state of the art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes.

https://arxiv.org/abs/2310.06816