Hacker News new | ask | show | jobs
by tptacek 1986 days ago
We don't know that Signal doesn't store data about users on its servers. Even the source code can't tell us that, because we don't run the servers.

What we do know is that programs like Telegram have to store data about users on their servers, by design. A big difference between the two projects is that Signal is carefully designed to minimize the amount of data the service needs to operate; it's why identifiers are phone numbers --- so it can piggyback on your already-existing contact lists, which are kept on your phone.

By contrast, other services store, in effect, a durable list of every person you communicate with, usually indexed in a plaintext database.

4 comments

> We don't know that Signal doesn't store data about users on its servers. Even the source code can't tell us that, because we don't run the servers.

Yes. Ultimately we have no choice but to trust trust itself.[a] That said, if the OP were a non-technical friend asking me the same question, I would respond more or less like this:

"Of all the widely used messaging services, Signal is the only one known to be designed to minimize the amount of user data needed to operate, and all indications are that they are operating as designed[b], so Signal is likely your best choice today if privacy is your main concern."

[a] http://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-thom...

[b] https://news.ycombinator.com/item?id=25764526

I've seen [a] on HN three times in the last week. Don't know if it's recency bias, Baader-Meinhof, a legitimate increase in common popularity/knowledge, or a sign of the times.

edit: after a quick algolia search, it has indeed been posted much more this year than years before.

It's been relevant recently due to the SolarWinds supply chain hack, too, since the implant was inserted into the build process, so I've been seeing it a lot more too. It wasn't used to infect a compiler, but still makes people think of Trusting Trust.
Signal has reproducible builds for Android. https://signal.org/blog/reproducible-android/

Does that help in any way to verify that they do not store data on their servers?

My understanding: If you verify the safety numbers in person, then I believe you can be confident that it's E2E encrypted for that conversation. If the safety numbers are different, then there could be a nefarious actor listening in.

Someone please correct me if I'm wrong.

Edit: That being said, I believe they could still record IPs, as well as the destination and timestamps of each message.

If they were storing that it would have been produced when they were forced to produce all data relevant to the case.
Agreed. Just pointing out what information they have access to if they wanted to start logging as much as they could.
Sadly I don't see any way to prove that over time except through periodic court orders :)
It only helps verify what data the client sends to their servers, not what fraction of that data is stored on their servers. They could be (but probably aren't; see other comments) storing e.g. information about how often you connect and the volume of data that passes through their servers.
We don't really know it, but there is some assurance the server is running the code they say it is because of Intel SGX.

https://signal.org/blog/private-contact-discovery/

I just had a skim over the post and it seems to be saying that it allows them to process user data without the OS having access to it. This does nothing at all for letting me verify what is running on their server or that they are even using this SGX feature at all.

It protects signal from hackers or a malicious datacenter provider at best.

I don't think you skimmed it very carefully.

> SGX enclaves also support a feature called remote attestation. Remote attestation provides a cryptographic guarantee of the code that is running in a remote enclave over a network.

> Originally designed for DRM applications, most SGX examples imagine an SGX enclave running on a client. This would allow a server to stream media content to a client enclave with the assurance that the client software requesting the media is the “authentic” software that will play the media only once, instead of custom software that reverse engineered the network API call and will publish the media as a torrent instead.

I can't trust any company that has to read my contact list PERIOD! It's not something anyone should be having to share ever.
Signal does not has to access to contacts. It does asks for contact access permission, to show in the app the names that you have set for your contacts. But you can just answer no and everything works.

On the contrary, if you answer the same to WhatsApp, it plain refuses to work. But it actually created an account on their servers, and from that on you appear on your contacts who do use WhatsApp as another user of WhatsApp, which invites them to write to you there although you cannot receive their messages. To fix this, you have to find the option in WhatsApp to delete your account.

Tally:

Signal 1 WhatsApp 0

That's the point. They don't.
I don't want to speak for the parent commenter, but I think the concern is that the local app could be exfiltrating the contact list (and then by the exact same logic, message content as well) in some side channel unrelated to anything seen in the published source code, unless (a) the user builds the apk from published source code themselves, or (b) if there's some way to prove that the apk received via the Play Store is identical to one built from that source code.

Is (b) achievable by all users who have this concern?

For the most part, and for Android users, b is achievable : https://signal.org/blog/reproducible-android/
Signal isn't a company. They're a non-profit. In addition, as others have mentioned, Signal works without giving them permissions to read your contacts.