Hacker News new | ask | show | jobs
by WillSmithPro 1685 days ago
I came across this twitter thread during the last Facebook outage. Apparently something very wrong is happening there with their backups. I would definitely check similar cases. Confirmed by multiple people :/

I quote the author:

This is really weird. In #WhatsApp, I started to see messages that I know 100% that I deleted 2 days ago?! WTF is happening there? I think this is a really big violation of privacy! I see the messages from a month ago, with my disappearing messages setting turned on?! Gosh

https://twitter.com/pytlicek/status/1445072626729242637?s=21

7 comments

I don't think services should be allowed to mislead their users in this way. If 'delete message' does not delete the message then it should not be labelled as such.

The option should be labelled as 'Hide Message*' together with a link to an explanation of the feature & its limitations.

OT: I saw a similar thing (deleted messages returning) in Messages on MacOS a couple weeks ago, shortly after I upgraded to Monterey. A whole bunch of deleted messages from 2017 through 2019 returned. Nothing from earlier or later returned.

I mentioned this on Reddit and someone replied that they saw it too with hundreds of deleted messages going back to 2015 returning.

How is this a violation of privacy… that’s not how e2e encryption works.
e2e encryption does nothing about ensuring deletion. Whatsapp can simply re-deliver the same encrypted blob.
That’s called a replay attack and is absolutely something e2e encryption protects against
There are actually three different things: replays, reloading a message, and delayed messages. Replays are impossible in the signal protocol, so that’s not what happened. Delayed messages is part of signal: you can receive message2 before message1. Reloaded message is probably what happened, it doesnt work at the signal level since “deleting a message” is not something signal specifies.
e2e encryption means the party in the middle does not have the key to the data. It is somewhat of a misnomer since it is a feature of key-agreement more than a feature of encryption.

Any other features are dependent on the protocol that uses the secret key. You will generally see an encryption method that is protected against cipher-text manipulation, but e2e does not guarantee that. Similarly, a protocol that uses e2e encryption can add replay protections, but it is not at all a feature inherent in e2e.

I could well imagine that whatsapp has some replay protection build in. I could similarly imagine they have a way to override that in case they need to. Heck, perhaps the replay protection is implemented with WhatsApp as the ultimate arbiter of what counts as a replay. As long as WhatsApp does not know the key used to encrypt my messages, the encryption is e2e in my book.

Whatsapp also shows certain messages as "forwarded many times".
That is done on the client side. Basically, you have a `ForwardCount` and if it is > 5, it shows that message. Not need for breaking E2E here.
I completely forgot about this issue :) I'm glad someone was interested in looking at how those backups work. Perhaps my complaints also contributed to this investigation :)))
These look like messages being re-sent from the service to the client.

This is not surprising - when you ask someone else to route messages for you, even encrypted messages, you are giving them the (encrpyted) payload and asking them to route it for you.

If you have a large network with billions of users, it's reasonable that some of the users' phones may be offline some of the time.

Should the service just drop messages on the floor when that happens, or buffer them in some queue (recall, they're E2EE) that gets emptied every so often?

Now assume all your infra has a hiccup (outage) and goes offline, and then comes online again.

Probably the retry logic didn't synch correctly and attempted to retransmit encrypted messages that had already been delivered.

In short, for distributed computing at scale, it is surprisingly difficult to ensure a message is delivered exactly once.
I'm not sure if that explains why deleted messages from months ago are being resurrected. That would imply that there is a persistence framework that has multi-month readback capability.
The oldest message from the twitter screenshot looks ~8 days old.

In the second tweet the user says "3 chats before the outage and now 15+ or more chats which I deleted before the week or two."

Two weeks (and in screenshots, only 8 days shown) does not seem surprising. Especially given the increasing rate of internet shutdowns across the globe [1].

E2EE is too important to play fast and loose with.

[1] "In 2020, Access Now and the #KeepItOn coalition documented at least 155 internet shutdowns in 29 countries." (https://www.accessnow.org/keepiton/)

The standard fix to this is to give each message a GUID, and then when you delete a message, instead of deleting it, your store a tombstone with the GUID, and that prevents a network re-send from causing the message to reappear.
Sorry, what’s a tombstone? Like add a field that says “deleted”?
Yes, it's a concept used e.g. in hash tables that use open addressing [0], where you can't delete value X because the final address of value Y may depend on whether X was present when it was being inserted. So instead of deleting X and leaving behind an empty address we change X to a tombstone that stays there forever and says "something was here and if you're looking for Y, Z or anything else, keep on looking".

[0] https://en.wikipedia.org/wiki/Hash_table#Open_addressing

Thank you
Or store message encrypted and corresponding key, then just delete the key.
this happen to me, I can still see messages that I delete yesterday.
Durov posting about this in his Telegram channel in 3... 2... 1...
What is Durov and why is it relevant?
he's one of Telegram's creator, and he likes to post long rant entries on his channel on Telegram whenever somethings like this happen
It’s not.
what channel is that?
All of them.