Hacker News new | ask | show | jobs
by jamesfe 3321 days ago
Well, maybe. I don't want to discount this concept entirely because some info is being "leaked" if that is the right word, but...

Let's say one of your contacts chats a lot because it's a chatty person. They're online far more than another person. What if that other person only chats on the bus on the way to and from work at roughly the same time every day to tell their wife they're on the way home. This activity will overlap with the chatty person's activity all the time.

By your rationale, they are having a conversation, maybe cheating, and maybe having a work affair.

I think the more contacts a user that are active, the higher probability that your model predicts they are having a "conversation" with another user. You'll probably find that your thresholds are really hard to fine-tune: maybe we say A chats with B if abs(A.activeTime - B.activeTime) < threshold, but that threshold is going to be super hard to find* and even harder to validate.

Sure, there is some information here (the picture probably being the most concretely weird) but the fact that you can just go to the App and check a box for privacy means that this seems like not a huge issue.

Yes, WhatsApp made the software, but its your responsibility to apply your own privacy settings.

3 comments

If you check the model I described in my comment, it should filter the "bus problem", since it will detect a chat only if, compared to the standard "bus time" probability of the user A chatting, it is chatting more if in the same range also B is chatting. If you add to this that people on Whatsapp usually do not talk to the exact minutes, it is definitely possible to create a robust system for guessing with good probability of two have often conversations. Also note that the phone numbers in input are not random, are the ones of a connected circle of persons. Add to this the fact that we can split the ranges even, potentially, by few minutes, and you can even detect interesting stuff for people having continuos chats with multiple persons like teenagers. Another thing that is possible probably is also "groups detection", since at new messages a set of users will activate at the same time.
I would think having a week or a month's worth of data is enough to get rid of such people who are "background noise", and get an accurate enough "who's talking to who" mapping. (Something like, "A and B's presence are within a minute of each other for 80% of the time, even during 'off-peak' times, for example early in the morning.").

We are worried about the NSA collecting metadata, but you dismiss this as an end user problem. Famously this is why Facebook has its settings to be opt-out instead of opt-in, because a high percentage of users never changes the default.

> Yes, WhatsApp made the software, but its your responsibility to apply your own privacy settings.

I do get Caveat Emptor but a lot of people do not understand the meaningful implications of privacy.