Hacker News new | ask | show | jobs
by sopooneo 2709 days ago
How can the server component of WhatsApp tell that the client is using the Python library you mention? If that library implements the protocol correctly, shouldn't it be indistinguishable from the official web client to the server?
4 comments

Creating a correct implementation of a protocol is one thing, creating one that is indistinguishable from another implementation another. E.g. for TCP/IP you can find out the OS just from the additional information even though implementations are correct [1]. I guess they are doing something similar. And even if your protocol implementation were perfectly indistinguishable from the other one, usage patterns might reveal differences. E.g. if you have a bot then bots will most likely reply immediately after you issue a command. If you add a constant wait time, it's still distinguishable from humans typing at varying times. Same for uniformly randomly distributed wait times, I'm sure there is a distinction (I'd say it's correlated to message length for example). All of this is visible without looking at the message contents, which are obviously not available to the WhatsApp service.

[1]: https://en.wikipedia.org/wiki/TCP/IP_stack_fingerprinting

Based on your parent comment...

> This is really sad for people like me who don't want to install WhatsApp on their phones because it sends too much of my data to Facebook... probably that's precisely why they are so strict, to get all the data because that's all they get (the service is gratis after all).

When you send a message through WhatsApp, WhatsApp knows you sent the message and who you are.

Since they know who you are, they also know, independently of the request you just sent, whether you're using their official client, and what data they've managed to extract from your phone.

If I get a seemingly-valid message from you despite the fact that I know perfectly well you've never installed my official client, I'm going to conclude you're not using my official client.

This was an incredibly interesting topic when pokemon go came out. People started using the reverse engieered protocol to gain information about the map from the pc (instead of the official android app), and Niantic updated the app every now and then with slight protocol changes and complex functions or crypto algorithms so that it was harder for those people to perfectly emulate the client. The protocol worked, but they could detect them sometimes because it was not identical.

I feel that I love this topic. It seems that it is an exciting challenge trying to protect an API to only be used (or at least, detected if not) by the official app. I don't even know if there is even a 'solution' to this problem. Would like to read more about it.