Hacker News new | ask | show | jobs
by r2vcap 38 days ago
This is frustratingly one-sided writing. Yeah, WebRTC has limitations, but relying on a standard buys you a lot of correctness and reduces long-term engineering cost. The fact that WebRTC is complicated does not mean it is wrong; it means real-time media over the public internet is complicated.

Also, networking is inherently stateful. NAT traversal, jitter buffers, congestion control, packet loss, codec state, encryption, and session routing do not disappear because you put audio over TCP or WebSocket. Pretending otherwise is not architectural clarity. It is just moving the complexity somewhere less visible.

4 comments

You might have noticed that the author started the blog post explaining themselves:

  Like 6 years ago I wrote a WebRTC SFU at Twitch.
  Originally we used Pion (Go) just like OpenAI,
  but forked after benchmarking revealed that it was too slow.
  I ended up rewriting every protocol, because of course I did!

  Just a year ago, I was at Discord and I rewrote the WebRTC SFU in Rust.
  Because of course I did! You’re probably noticing a trend.

  Fun Fact: WebRTC consists of ~45 RFCs dating back to the early 2000s.
  And some de-facto standards that are technically drafts (ex. TWCC, REMB).
  Not a fun fact when you have to implement them all.

  You should consider me a Certified WebRTC Expert.
  Which is why I never, never want to use WebRTC again.
I think that they've done more than enough of 'trying the normal way' to be warranted in having an opinion the other way, don't you think?
Yes,agreed. I also found it apparently obvious that they have proven their experts worth on this subject matter. Many times, over and over.
But ChatGPT said …
Right but they also state they have never implemented TURN which IMO is a marker of WebRTC expertness. (I haven't btw, just the WebRTC experts I know absolutely have written or worked on at some point a TURN implementation)
It's not that strange. TURN has two main use cases: peer-to-peer when no viable direct path can be found and working around very strict firewalls. Based on the author's experience the first isn't relevant and the second isn't much of a concern for Twitch and Discord. For the latter case HTTP/3 is helping make TURN unnecessary because you can, as the author observes, run UDP over port 443.
> This is frustratingly one-sided writing

Tangential, but by being that, it's also refreshingly human writing, vs the both-sidesy bullet listed AI pablum that's all around us these days.

I have zero take on the subject matter, but I like that the article had a detectably human flair.

And if it was AI written, god help us.

“How hard can it be?” the strawman asked.

It’s 2026 and teleconferencing is still such a shit show. There’s billions of dollars to be had and Zoom is at best mediocre, and it can be as bad as Microsoft Whatchamacallit. I’ve never not seen teleconferencing be a ham handed mess.

Facetime does alright in the consumer segment.
The most frustrating thing about FaceTime is it sometimes appears to significantly duck audio in order to avoid echoes. I can't predict on which devices it will happen, but it often does when I call my parents and it absolutely destroys the conversation. If they're telling me something and I make the slightest "uhuh" acknowledgment sound, their mic input gets effectively muted for a second or so and I miss what they say.
I’ve found that if the recipient is wearing headphones/earphones, you can freely interrupt them without getting ducked (and vice-versa). Doesn’t help much when you’re calling multiple people / have to be on speaker, but makes it predictable at least.
Well sure, that's the easy case for echo cancellation (remove the source of echoes entirely).

But we've had solutions to this for decades and it doesn't involve ducking the recipients audio. Apple and their billions of dollars should have had this completely solved by now.

QUIC is also a standard.