Hacker News new | ask | show | jobs
by lrm242 5598 days ago
Do you store full copies of all data? What I mean is, if someone breaks into Greplin, can they effectively read all of my email assuming I've synced with Gmail? Or do you just index the data and reference sources using URLs?
1 comments

Even if they just have an index without the full copy it's not that hard to reconstruct a version similar to the original just from the index, as in an inverted index you typically do not only store the documents of a term, but also the word number within the document. However, it's not possible to restore the original version exactly, due to things like stemmers.
often in text indexing the original document is kept for things like snippet generation.