| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sfink 306 days ago

> Vector embeddings are lossy encodings of documents roughly in the same way a SHA256 hash is a lossy encoding.

Incorrect. With a hash, I need to have the identical input to know whether it matches. If I'm one bit off, I get no information. Vector embeddings by design will react differently for similar inputs, so if you can reproduce the embedding algorithm then you can know how close you are to the input. It's like a combination lock that tells you how many numbers match so far (and for ones that don't, how close they are).

> It's virtually impossible to reverse the embedding vector to recover the original document.

If you can reproduce the embedding process, it is very possible (with a hot/cold type of search: "you're getting warmer!"). But also, you no longer even need to recover the exact original. You can recover something close enough (and spend more time to make it incrementally closer).