|
|
|
|
|
by jncraton
832 days ago
|
|
You might be interested in "Text Embeddings Reveal (Almost) As Much As Text": > We train our model to decode text embeddings from two state-of-the-art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes. https://arxiv.org/pdf/2310.06816.pdf There's certainly information loss, but there is also a lot of information still present. |
|
“a multi-step method that iteratively corrects and re-embeds text is able to recover 92% of 32-token text inputs exactly”.