Hacker News new | ask | show | jobs
by lazzarello 4739 days ago
With VoIP, signaling encryption and media encryption use two different keys. If you have the first, you do not have the second. This is difficult to understand without reading the SIP protocol (RFC 3261) and very few writers of blogs talk openly about it because most people outside of the niche of VoIP only care about HTTPS as the "secure protocol stuff" and stop there.

Here's the trick: SIP has nothing to do with sound or video. It "establishes sessions". The typical SIP dialog flow has a hierarchy of many other protocols. In order, they read like this

SIP->TLS->SDP->ZRTP->SRTP

That dude in the middle is the Session Description Protocol. This describes what will happen in the future regarding the media stream. When the clients agree on this (codecs, IP addresses, ports, etc), a full-duplex session is established between the two peers. The preceding TLS stuff, which depended on a CA is now over. We are ready for round two.

This is what you missed. We haven't even begun sending data over our media socket yet and the security stuff that depends on a central authority is finished.

Now that we can speak to each other, let's do that! But wait! My client has an alpha numeric string on the screen. This is called a Short Authentication String. When you read the SAS to me and I read mine to you, we click "OK" and now our conversation is encrypted. Because we agreed on a key with our words, not our fingers.

If you would like to try this IRL, you can call lee@ostel.co. I'm online right now.