Hacker News new | ask | show | jobs
by sibit 1485 days ago
Is anyone on HN using WebRTC in production? I recall watching a conference talk by Martin Kleppmann a few years ago where he was discussing CRDTs and Automerge. He mentioned how they attempted to use WebRTC but it wasn't reliable so they had to use Web Sockets with a custom message relay server instead.
16 comments

Google Meet is WebRTC in production. As is Discord video and audio, Facebook messenger video, and Whatsapp calls. There are several WebRTC-as-a-service platforms that are relatively large and are used by diverse applications (meetings, telehealth, teaching/tutoring, events). I co-founded and work at one of them (Daily.co YC W16).

WebRTC is almost the only choice for low-latency video and audio inside a web browser. The open source libwebrtc [1] implementation that's in Chromium and Safari is now mature enough to be used in other native applications if you have a medium-sized engineering team and are comfortable with C++. (Again, WebRTC-as-a-service platforms often provide native libraries that wrap libwebrtc to give you easier to use full stack iOS, Android, (etc) SDKs.)

The three big challenges with WebRTC are that low-latency media is its own domain and the learning curve is steep, that scaling sessions to more than two or three people requires a lot of server-side packet routing code (you can't do pure peer-to-peer with lots of participants), and that there aren't yet the mature "off the shelf" cloud building blocks that exist for HTTP-ish workloads.

WebRTC data channels are the non-video/audio part of the WebRTC spec. My hot take: data channels are rarely the right solution to any problem description that doesn't start with, "well, I already have a WebRTC transport open ..."

[1] https://webrtc.googlesource.com/src

> My hot take: data channels are rarely the right solution to any problem description that doesn't start with, "well, I already have a WebRTC transport open ..."

On the off chance that you're still checking this thread, would you care to elaborate? Specifically, if one isn't using WebRTC for media, is there a better off-the-shelf solution for P2P with NAT traversal in native apps?

I think the trick is whether they are using TURN or STUN wrt reliability.
Our team got off the ground really quickly using https://github.com/feross/simple-peer to handle the majority of the WebRTC client implementation. We're sending video and voice, so websockets aren't feasible. I'd say it was a lot easier than I expected coming in cold, and about 95% of connections establish quickly and don't have any problems.

However for that remaining 5%, I have a lot to learn. Using an abstraction is great when it works, but I'm interested in going through OP's project to get a better sense of what's happening when things go wrong.

You're so right. In most cases, you won't have a problem if you implemented it with existing products, but for 5%, lots of cases may happen :) The complete reason behind WebRTC Nuts and Bolts is this!
Glad that simple-peer was helpful to you :)
It may not meet the qualifications of "production," but about a year ago I made an OBS.Ninja clone with some specifics for live streamers, and I was extremely satisfied with the reliability of WebRTC -- multiple hour streams with zero dropouts, and without any fancy code to handle reconnects or adjusting for lower bandwidth. It just kinda magically works. The browser implementations are an absolute disaster, but if you can make those limitations work for the project (or if you don't need to use a browser at all), then I'd feel pretty confident using it in production and at scale.

NewTek actually uses WebRTC for NDI's remote networking, and while the NDI software itself is prone to crashing and probably not usable for production, the connection to the remote system is never an issue.

Did you use homebrew stun/turn servers too?

I don't find the webrtc signaling and set up particularly noteworthy, but once you try to connect nodes on different networks you're pretty much dependant on some third party.

Luckily there are a lot of cheap STUN/TURN services out there, and if you really need something under your control, there are containerized projects on GH that make it easy to run your own. Though even when I used it as a Zoom replacement for meetings, I never ran into a situation where TURN was necessary, and that includes people behind corporate firewalls. It seems as though corporate netops learned some lessons during the pandemic and loosened restrictions.
My software uses it to send video, audio and metadata from a C++ server to the browser. I found WebRTC to be an nightmare causing this feature to take months longer than expected to implement.
What library did you use for WebRTC? What were the pain points you hit?
Originally used googles WebRTC but it didn't offer enough control, now we use libdatachannel. Getting the peer connections setup, getting what we wanted out of the SDP negotiation, restructuring the video and audio data in the correct format were some issues I remember running in to.
Yes! I have used it in production with these.

* FaceTime @ Apple https://support.apple.com/en-us/HT212619

* KVS and Chime @ AWS https://github.com/awslabs/amazon-kinesis-video-streams-webr.... Lots of security cameras and robots use it, not public though.

* Lightstream https://golightstream.com . Cloud compositing and other magic.

I also have something I am working on now that isn't public yet that is using WebRTC. Really excited to see what people build with it/what it inspires next.

It is kind of amazing everywhere you will find WebRTC. Stadia, Boston Dynamics, Zoom, Meet, Security Systems, Drones etc... It is probable that you use WebRTC in production everyday :)

I think Zoom doesn't use WebRTC, they have their own decoder, transmission, retry layer. If you try to transmit raw frames directly, you end up with codec issues since frames have dependencies among them. E.g. VPX series cannot decide frames without a golden frame or a key frame so these need to be kept as state for a decoder. Every other intra-frame needs to refer to these frames and there's no limit usually as to how many intra-frames one can have between inter-frames (key-frame or a golden frame).
I don't work at Zoom/only have outsider information. It looks like they are using Media over DataChannels[0]. So they are still using WebRTC!

When Google announced WebCodecs/WebTransport they said Zoom was involved, so maybe they will switch to that eventually?

[0] https://webrtchacks.com/zoom-avoids-using-webrtc/

Does data channels over WebRTC enforce different semantics than say QUIC/Http2 ? I understand that the difference b/w TCP and WebRTC based comm would be application level transmission guarantee but does it really differ from UDP based HTTP implementations ?
The biggest problem with sending media over data channels is that there's no good way to do bandwidth estimation. Data channels weren't designed to be used for media, and the current WebRTC spec (and javascript implementation) doesn't expose enough control of either the codec or the network stack to implement real bandwidth estimation and bandwidth control. This is presumably the main reason that Zoom's in-browser implementation is so limited in functionality.

There's a spec for RTP over QUIC [1]. It's really cool! But obviously very early days.

[1] https://datatracker.ietf.org/doc/draft-engelbart-rtp-over-qu...

Zoom uses Media over DataChannels only for the browser version of their product. For the desktop and mobile clients they use their own implementation.
In research by Stanford a few years ago, WebRTC was substantially worse than some other communication systems if the available data rate of the network connection varies in certain ways (e.g., the available data rate becomes lower for about ten seconds): https://www.youtube.com/watch?v=nuI4F5akBIs&t=2571s
This conflates implementation and protocol. You can use w/e Bandwidth Estimator/Congestion Control algorithm you want.

I think it is worth measuring how Google's implementation works, but it is tuned for a very specific use case by a single company.

(Speaker in that video here)

You're right that this is measuring the WebRTC.org codebase (as used in Chrome, Firefox, etc.), not necessarily the WebRTC protocol. Better URL and demo/talk videos are here: https://snr.stanford.edu/salsify

But the issue probably isn't with the bandwidth estimator or congestion-control algorithm -- you probably can't fix this by taking some WebRTC implementation and plugging in better ones. The core issues as we see them are about the architecture of the WebRTC.org codebase, and frankly all WebRTC source/sink implementations we're aware of, in particular:

(a) even with perfect bandwidth estimation, libvpx and libx264 (and, we think, typical hardware encoders) as configured are bad at achieving the requested bitrate over short timescales, meaning that "overlarge" coded frames are regularly being sent, and techniques like reference invalidation or golden/altref encoding are never (?) used to skip sending an overlarge coded frame or to retry encoding the same frame at a lower quality before sending. [At a technical level, the interface between the encoder VBV buffer model and the bandwidth estimator/CC algorithm is very inconvenient -- to have these two control loops running independently, trying to do similar things at similar timescales, isn't great.]

(b) loss recovery does not work well, and again features like reference invalidation to recover quickly do not seem to be well-used in practice (the second half of https://youtu.be/jaDelb4JnP4 makes this pretty clear),

(c) the WebRTC.org codebase is so complex, with so many modes that it can get settled in, that trying to reason about these behaviors or explore them systematically is quite challenging, and

(d) because there are several layers of buffering on the receiver-side, and because the sender-side code will change things like the camera's frame rate, it's hard to measure application-level metrics [e.g. lens-to-display and microphone-to-speaker latency] robustly in a deployment, especially across a diverse hardware or OS base. (And it's easy to get a false sense of security from the network-level WebRTC metrics that are available.)

It's probably possible to produce a WebRTC source/sink implementation that works better over challenged networks and has good monitoring of application-level latency, but it would be a big job afaik. Our work was partly funded by Google, we had a high-up Google sponsor, we gave multiple talks at Google, etc., but it was challenging even to find "the people in charge" of this cross-modular stuff to talk with them, because I think the codebase in some respects mirrors the org chart. E.g. you have video compression people worrying about the encoder (and wanting to be able to plug in libvpx, libx264, and a bunch of hardware encoders to the same interface), and networking people worrying about the bandwidth estimator and CC algorithm, and it's sort of way too late to say that the interfaces or architecture needs to be refactored or that the complexity has gotten out of control. To Google's credit, they have since driven the industry to produce standardized APIs for "functional" codecs, and functional decoder ASICs now exist (not sure about encoders yet), so there is progress being made on that front at least.

I am.. my use case is a computer with two devices in the same building and they share data over webrtc datachannels..

There are some quirks in getting all the signaling working but there’s now more of a standard to do that process the right way…

Anyway since the devices are on the same network latency is nearly 0…

I do have an ack and retry process … I should add logging to see how often that happens though

Hi, we used for development of a browser-based video conferencing system with my team at the company I used to work for. You're right, WebRTC datachannels may be unreliable. But the nature of UDP is prone to packet loss, unordered packets etc.. To solve these problems, the WebRTC standard offers some error detection/correction/prevention technics. I used it in production for video/audio transfer but I preferred for data streaming (and also signaling) WebSockets. The WebRTC standard requires some quirks, so it is the reason why my project (WebRTC Nuts and Bolts) was born.
I use it for Rambly.app. It works pretty well, but there are definitely cases where it fails for some people.
Not in customer hands yet but yes, webrtc with a WSS to signal. It’s been incredibly reliable

Edit: that’s Web Socket Server

Pretty sure Discord uses WebRTC.
Jam (https://github.com/jam-systems/jam) is based on WebRTC: audio rooms like Twitter Spaces or Clubhouse
We use it at https://testingbot.com to provide a realtime video stream of remote desktops and mobile device screens. Mostly with Pion (Go)
We use it at https://tandem.chat. I work on the AV stack. WebRTC is pretty awesome.
We use WebRTC in production as a way to provide remote desktop access to data center computers in a browser. It has its ups and downs.