Hacker News new | ask | show | jobs
by sanxiyn 3190 days ago
I actually think chat conversation from a decade ago would be quite relevant. One baseline recommendation system is "people who bought X also bought Y". Consider "people whose conversation is in cluster X generally liked people in cluster Y". If chat conversation can be usefully used to cluster users for better matching (and I think it can), it would be valuable to keep even if content is of no interest.
1 comments

> even if content is of no interest.

Can't they just keep (at most) the metadata?

As a data scientist, I think losing actual words would be a loss. Words would be only used by word embeddings like word2vec, but actual words let you switch to better word embedding later.