Hacker News new | ask | show | jobs
Wiretapping End-To-End Encrypted VoIP Calls: Real-World Attacks on ZRTP (sufficientlysecure.org)
117 points by TjWallas 3383 days ago
4 comments

For those interested, Signal doesn't seem to use ZRTP anymore:

> The new Signal voice and video beta functionality eliminates the need for ZRTP. The "signaling" messages used to set up the voice/video beta calls (offer/answer SDPs, ICE candidates, etc) are transmitted over the normal Signal Protocol messaging channel, which binds the security of the call to that existing secure channel. It is no longer necessary to verify an additional SAS, which simplifies the calling experience.

https://whispersystems.org/blog/signal-video-calls-beta/

And it's not in beta anymore:

https://whispersystems.org/blog/signal-video-calls/

Author of the paper here.

Yup, in regards to Signal our findings are already obsolete :D I think that the new Signal developments are great. It is better to allow only one key verification mechanism for unified usability and also use key continuity. Before, SAS needed to be verified for each call again.

But isn't now with signal that you have to wiretap it once and your are good to go since there are no sas every time?
Sure, but "wiretapping it once" would mean breaking a lot of well studied and until now unbroken crypto.
That's sort of too bad, because it looks like Signal was one of the only implementations they audited that had no issues.

I hope these authors will eventually look at the new thing too.

The more interesting would be to see how feasible is to crack the in band SAS authentication string, when callers verbally verify it.

Deep learning and ability to train on a specific callers' voice [1] then mimic it might be an interesting attack vector. In practice Silent Circle's implementation does something interesting and instead of SAS numbers use dictionary words. So you end up with something like "Pink Elephant Salad". Could probably MitM that. However callers are then supposed to make some extra puns or discuss it a bit and say something like "Ha-ha! Wonder how tasty the an elephant salad would be". And if after MitM-ing, the string to the other side was "Plastic Blue Llamas" then a MitM attack becomes more obvious.

[1] http://research.baidu.com/deep-voice-production-quality-text...

Author of the paper here.

There is existing work on testing the feasibility of impersonating other person's voice. We discuss them in our related work section at the end of the paper.

I think on the long run, SAS will no longer be a sufficient authentication technique due to advances in speech synthesis. To prolong ZRTP's life we propose usage of sentences instead of words/chars. This is discussed in detail in our best practices section.

This is a fascinating and well elaborated article!

I noticed that UX/UI is important and a guarantee that SAS should increase in length, what are some of the recommendations that you advise to have a good ZRTP implementation ?

Or should we start discussing the fadeoff of ZRTP and a change to something like Matrix protocol or even Signal's one ?

Both Matrix and Signal use WebRTC for VoIP so the content is encrypted by default. Call set up and signaling is also encrypted by default with Signal, and possible with Matrix - it's automatic if the room from which a call is established is already encrypted.

I know Signal attempts to prevent any data leakage by forcing the Opus codec to use a constant bitrate instead of its default VBR -- I'm not sure if Matrix implements anything similar yet.

Instead of using fancy deep learning to fake the voice why not simulate a degraded signal by adding noise and using a good text to speech program (after you have figured out gender ).
AFAICT, This looks more like attacks on the implementations of ZRTP than on attempts to find weaknesses in the underlying protocol.
One sort of assumes that from "Real-World Attacks", no?
Or real world attacks on solid implementations.
This is fascinating. Thanks for writing this paper, guys.