Hacker News new | ask | show | jobs
by antirez 3174 days ago
A few weeks ago there was a similar discussion, and I commented the following:

If you think there is no problem, you are wrong. The blog post does not show all the information leaks that this implies. Example: I can modify the script to monitor all the numbers I've in my phone, so that based on the online/offline status in a few weeks I can be able to guess who is having conversations together, discovering cheatings, work affairs, ... EDIT: Practical example. After collecting enough data about user X I create a table about the probability of this user being online in a given few-minutes time ranges. Then I check the online frequency of that user compared to the online statuses of another user Y. If the difference compared to the expected probability is significant, than I can suspect the two are chatting. Another thing I can use is that attivation delay of the online status, since often X sends a message to Y and this results in, a few seconds after, Y to be online, and then the contrary.

[then an HN user said she/he was not sure this was serious because maybe the users casually had similar patterns, so I replied:]

If you check the model I described in my comment, it should filter the "bus problem", since it will detect a chat only if, compared to the standard "bus time" probability of the user A chatting, it is chatting more if in the same range also B is chatting. If you add to this that people on Whatsapp usually do not talk to the exact minutes, it is definitely possible to create a robust system for guessing with good probability of two have often conversations. Also note that the phone numbers in input are not random, are the ones of a connected circle of persons. Add to this the fact that we can split the ranges even, potentially, by few minutes, and you can even detect interesting stuff for people having continuos chats with multiple persons like teenagers. Another thing that is possible probably is also "groups detection", since at new messages a set of users will activate at the same time.

[And the attack can be refined a lot with more powerful mathematical approaches]

6 comments

Once I hacked together a similar program to log the terminal activity of fellow students on the university UNIX and Linux servers: https://github.com/andyn/actspy/blob/master/actspy.c

The main objective, however, was not to stalk innocent users but to catch an anonymous IRC troll who was using an identless shell server in order to hide their real account name. Every time the troll wrote to IRC, the activity logger program showed typing activity from a certain user. After a few message exchanges during quiet night hours I was able to reliably pinpoint them.

So what happened after finding out who it was?
That'll be the boring part of the story. I just /msged her primary nick and asked nicely if it was possible to stop. Apparently the threat of losing anonymity is enough to turn trolls back to normal people.
This is, in a nutshell, the secret of the internet.
The troll was never heard from again. He was made to "disappear" :D
I'm using an xposed mod to continually be online on WhatsApp, even when my phone's screen if off. Would this thwart such attacks? I mean it would be an anomaly, but other than that I don't know what other information you could get, except the last Online/Offline never changing.

It confuses my friends though, who write me more at night times than usually, luckily though I've a DnD mode that saves me from waking up. (confuses gf too)

Please finish this story.
This isn’t just necessarily a problem with WhatsApp. The same applies to IRC, if you set away states.

Even if you don’t set away states, one can simply monitor every channel you’re in, every message you send, and then quickly determine what timezone you’re in, when you sleep, when you’re on vacation, etc.

Here’s an example graph of a user, every dot is a message: https://i.imgur.com/DrgVvVw.png and here one from a user with more regular sleep patterns: https://i.imgur.com/a1xdSqR.png (notice the timezone transition when daylight savings time starts? And notice how the user takes about 2 weeks to adjust?)

In chat applications, those features were the first to get disabled. As I recall, one of the MSN Messenger features required you to sign in before you disabled it.

Anyhow, I'd disable showing online status, typing status, or automatically changing status based on activity.

This was a decade and a half ago, probably longer. The principle remains the same. No, no I don't want you to know when I'm in front of my computer, typing, or otherwise. If I want to appear online, I'll manually do so.

Those graphs are determined purely from messages - no away states, nothing

There's no way to disable that.

See also: timestamps when HN comments are posted.

This is much more interesting because pretty much everyone only participates in discussions when posts are on the front page - it would be tough to schedule/delay a post and stay relevant. Also, the lock-in after 1 hour (or a reply) preventing deletion is huge.

Some HN participants are now kind of "whales" in the startup community - at the very least, this info could be used to schedule cold-pitch emails! (And this is across the entire archive of past users, not just current users. These habits need not necessarily change much.)

Timestamp metadata is all over the place - GitHub activity graph, blog post comments, etc. -- merging timestamps for the same person across their accounts on all the different services offers amazing insights.

The only way to "disable" this is to schedule things or provide garbage data (only when user input is given precedence - like with this tool: https://www.laurencegellert.com/software/github-graph-builde...).

Github activity profiles.
Reason #32* to avoid the application.

* Entirely made up number.

If you're in a group chat, anyone that is in the same chats can see all your messages in there.

That's the one and simple trick.

Correct, but this doesn't appear to be group chat. This appears to be individual chat and the timing attacks based on their online presence. Group chat, by default, is going to seriously hinder one's ability to remain private.
Waybackwhen in 1998, when ICQ was a thing, I had an ICQ client on my Amiga that was scriptable. It was fairly trivial to write a quick program to tell it to change status at random times, to confuse people as to my whereabouts.
Yikes, that second person is almost robotic in their sleep patterns.
Actually, most normal working people are similar to the second person.

Only few people have sleep patterns like me (first, erratic graph), and I have them because I spend often my nights working on projects, trying to build new products, and once I've started one, it's hard to stop.

I don't think one of them can be a working person. Both are texting all the time while they're awake.
Both of them are working, and have full jobs. The texts captured included technical support channels, slack-irc bridges, and more.
wow! I find the images creepy and amazing at the same time :)

PS: It's not so much the images themselves but what they mean i.e. this analysis :)

Most Tor busts follow a similar pattern, watching both ends of the connection.

There is a real need for a "tor delay" metadata-disruption-as-a-service, where random strangers invoke one another's web callbacks and report back the result in exchange for Bitcoin (Strangers on a Train -style). Someone put it on the block chain and start an ICO!

same risk as hosting an exit node. It could take a while convincing police that you obfuscated traffic patterns without any involvement in the crimes they're after. I think most people wouldn't want to take that risk.
Either you are missing the point or I am.

As I understand random strangers are logged on to tor and invoke each others' callbacks and give back results. Since all of them are anonymized, This is not at all similar to an exit node.

The only purpose of this is to make tor packet traffic patterns hard to follow :)

This could be a service, but not sure if this can be filtered out by the snooper. These will be one off requests from random nodes and will not affect your tor traffic pattern much because I posit the signal to noise ratio of your main activity will be pretty high. Hmm, :thinking: perhaps if we jack up this random traffic, would that hide your main traffic maybe.

Anyone who knows such analyses want to chime in? :)

The thing is, this method works pretty well if people are chatting in real time, if you wait like 10 minutes to answers messages, it is much more difficult to create the links.

Moreover if people are using all the time Whatsapp, it is again much more difficult to do.

But I agree with you, there are many situations where these could work

Unfortunately even under much more noise than the Whatsapp activation patterns, we have seen timing attacks working in incredible reliable ways, with the network in the middle adding random delays, and even when the task at hand was to misure very small differences in time. So I guess that if this attack already seems feasible in certain contexts, it can only get much better using more advanced techniques.
A similar indirect way can be used to extract information out of Google's database. For example, launch an ad-campaign for any product, directed at people who love cats. Now if people click on the ad and buy the product, you know they must love cats.
This might seem like a brilliant idea, but you'll run out of money before you map all people for all things.

If you have so much money (spit balling here), you could buy google itself I think

There are easier ways to get data on people like their social profiles, and other online breadcrumbs like yelp reviews, any digital footprint really.

Another way is to buy databases of people. People have databases of HNIs, etc that you can purchase. This of course doesn't lend itself to much analysis but if the main purpose was to market to them or something like that, then databases work best :)

I tried to do something like that over 2 years ago. I never got to work on the analyze part of the data. I still have a 2GB database of online/offline, status (text status thingy) and profile picture changes. Someday I'll get back to that data and analyze it.