For those interested, Signal doesn't seem to use ZRTP anymore:
> The new Signal voice and video beta functionality eliminates the need for ZRTP. The "signaling" messages used to set up the voice/video beta calls (offer/answer SDPs, ICE candidates, etc) are transmitted over the normal Signal Protocol messaging channel, which binds the security of the call to that existing secure channel. It is no longer necessary to verify an additional SAS, which simplifies the calling experience.
Yup, in regards to Signal our findings are already obsolete :D
I think that the new Signal developments are great. It is better to allow only one key verification mechanism for unified usability and also use key continuity. Before, SAS needed to be verified for each call again.
The more interesting would be to see how feasible is to crack the in band SAS authentication string, when callers verbally verify it.
Deep learning and ability to train on a specific callers' voice [1] then mimic it might be an interesting attack vector. In practice Silent Circle's implementation does something interesting and instead of SAS numbers use dictionary words. So you end up with something like "Pink Elephant Salad". Could probably MitM that. However callers are then supposed to make some extra puns or discuss it a bit and say something like "Ha-ha! Wonder how tasty the an elephant salad would be". And if after MitM-ing, the string to the other side was "Plastic Blue Llamas" then a MitM attack becomes more obvious.
There is existing work on testing the feasibility of impersonating other person's voice. We discuss them in our related work section at the end of the paper.
I think on the long run, SAS will no longer be a sufficient authentication technique due to advances in speech synthesis. To prolong ZRTP's life we propose usage of sentences instead of words/chars. This is discussed in detail in our best practices section.
This is a fascinating and well elaborated article!
I noticed that UX/UI is important and a guarantee that SAS should increase in length, what are some of the recommendations that you advise to have a good ZRTP implementation ?
Or should we start discussing the fadeoff of ZRTP and a change to something like Matrix protocol or even Signal's one ?
Both Matrix and Signal use WebRTC for VoIP so the content is encrypted by default. Call set up and signaling is also encrypted by default with Signal, and possible with Matrix - it's automatic if the room from which a call is established is already encrypted.
I know Signal attempts to prevent any data leakage by forcing the Opus codec to use a constant bitrate instead of its default VBR -- I'm not sure if Matrix implements anything similar yet.
Instead of using fancy deep learning to fake the voice why not simulate a degraded signal by adding noise and using a good text to speech program (after you have figured out gender ).
> The new Signal voice and video beta functionality eliminates the need for ZRTP. The "signaling" messages used to set up the voice/video beta calls (offer/answer SDPs, ICE candidates, etc) are transmitted over the normal Signal Protocol messaging channel, which binds the security of the call to that existing secure channel. It is no longer necessary to verify an additional SAS, which simplifies the calling experience.
https://whispersystems.org/blog/signal-video-calls-beta/
And it's not in beta anymore:
https://whispersystems.org/blog/signal-video-calls/