Hacker News new | ask | show | jobs
by tommoor 2264 days ago
The video must be decrypted to do the scaling, transcoding, and dynamic bitrate adjustment for different platforms and network speeds – there's no way around this.

All of the group video providers will be decoding video on the server to achieve the reliability everyone wants.

3 comments

The clients can dynamically adjust their sending rate and resolution. It is pretty easy to downgrade your video when not talking. Its also possible to number and label the encrypted frames (I frames or B frames), allowing decimation for low bandwidth clients without recoding, which is expensive and slow.

There are also ways to send additive resolution streams -- a base stream and additional detail layers (multiresolution like JPEG2000).

Clients already do this yes, but to achieve the reliability that zoom is renowned for you also need to dynamically adjust what is sent _TO_ individual clients
Can the clients not advise the server what bandwidth and packet loss rate they are seeing, so the server can adjust their traffic rate? There is no reason not to allow a signalling channel separate to the E2E encrypted data channels.
The server would then have to tell the streamer / talker to reduce their quality for that one client while degrading all other clients that have a fine connection right?
If you absolutely must have high frame rate for everybody then you might make that choice. But generally you could just drop B frames from the stream for the slow client, so they get a slower frame rate.

If you wanted to be fancy you could do distributed processing, whereby another node downscales and re-publishes the feed.

In the Jitsi thread (https://news.ycombinator.com/item?id=22758131) also currently on the front page, someone from Jitsi mentions that they do have a way around this and that they don't decode the video on the server:

"We use a technique called simulcast. It consists on making every participant "work a bit harder" for the good of the bunch.

That is, every participant sends 3 separate video resolutions to the server: 720p, 480p and 180p (this may change due to bandwidth constraints). Then the server will only forward the approopriate layer to each other participant. So, if you are only seeing me in a thumbnail it will only forward the 180p layer. If I become the active speaker (or you choose to pin me to the large view) the server will immediately switch to forwarding the 720p layer."

So s/All/Not all/. Note that Jitsi does not claim to have e2e encryption.

The Zoom article we are discussing here claims they do not do that.