Hacker News new | ask | show | jobs
by adrian1973 3189 days ago
It sucks that you are probably correct.

My own biggest issue with data retention is not that these companies collect all this data (they need to for their business models to work) but that they keep all of it, forever, regardless of whether it could possibly still be relevant to any business purpose (such as chat conversions from a decade ago).

1 comments

I actually think chat conversation from a decade ago would be quite relevant. One baseline recommendation system is "people who bought X also bought Y". Consider "people whose conversation is in cluster X generally liked people in cluster Y". If chat conversation can be usefully used to cluster users for better matching (and I think it can), it would be valuable to keep even if content is of no interest.
> even if content is of no interest.

Can't they just keep (at most) the metadata?

As a data scientist, I think losing actual words would be a loss. Words would be only used by word embeddings like word2vec, but actual words let you switch to better word embedding later.