| So I have an affection for Discord...so I appreciate posts like this. >"Using the WebRTC native library allows us to use a lower level API from WebRTC (webrtc::Call) to create both send stream and receive stream." So I'm gathering that discord's voice servers receive multiple persistent connections, then compress the audio streams for delivery to each end user. THIS part is where I can't imagine the on-the-fly cpu usage. Each client's receiving compression needs to also negate their own audio to prevent an echo effect (no point to hear your own voice), but it also means separate compression streams per user. >" All the voice channels within a guild are assigned to the same Discord Voice server." I imagine this helps significantly with I/O in converting live streams into 1 stream per end user. I've dealt with video compression (only in ffmpeg) and live syncing time stampings, and I can say from experience that, this is no easy feature. I understand this is audio streams (so lower overhead), but still the persistent voice server needs to handle the incoming connections, web socket heartbeats (negligible), compression (high I/O), and deliver the streams (high memory usage too). I'm impressed, but would love to hear the specs on the media servers and their DL/UL speeds. My old setup to deliver live video (in sync and compressed) was 6 mini-itx's, 4GB of ram per board, and i3's...my bottleneck was my isp, which I solved with multiple docsis modems and an internal switch (each board had 2 ethernet ports). |
The bulk of the user-space time on the SFU is spent doing encryption (xalsa/dtls). We also avoid memory allocations in the hot paths, using fixed-size ring buffers as much as possible.
Additionally, we coalesce sends using sendmmsg, to reduce syscalls in the write path: (http://man7.org/linux/man-pages/man2/sendmmsg.2.html)
I posted some about the specs here: https://news.ycombinator.com/item?id=17954163