| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jaffee 842 days ago
	> embedding vectors you've calculated from the code? If so, those are likely quite easily reversible I don't think embeddings are generally reversible... you're usually projecting onto a lower dimensional space, and therefore losing information.

3 comments

jncraton 842 days ago

You might be interested in "Text Embeddings Reveal (Almost) As Much As Text":

> We train our model to decode text embeddings from two state-of-the-art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes.

https://arxiv.org/pdf/2310.06816.pdf

There's certainly information loss, but there is also a lot of information still present.

link

simonw 842 days ago

Yeah, that paper is what I was thinking about. https://simonwillison.net/2024/Jan/8/text-embeddings-reveal-...

“a multi-step method that iteratively corrects and re-embeds text is able to recover 92% of 32-token text inputs exactly”.

link

yunwal 842 days ago

"Quite easily" isn't true in most cases, but embeddings are sometimes reversible. We know this because programs like Stable Diffusion sometimes output near-perfect copies of training data when given the correct prompt, and generation of that image is based on word and image embeddings alone.

link

dartos 842 days ago

I’ve never heard of reversible embeddings in practice.

In theory if you know the model being used you could reverse them.

link