Hacker News new | ask | show | jobs
by pthatcherg 1577 days ago
Yes, you are right that there is also a simple loss-based congestion control mechanism (https://github.com/jech/galene/blob/e8fbfcb9ba532f733405b1c5...) and a min() between it and the REMB. I missed that part. However, that appears to still be immature, only in a different way than I thought.

If I'm right that only one server->client video stream (called "down track" in the Galene code) is receiving the REMB message, then only one will use the REMB value and the rest will fallback to loss-based congestion control. If I'm wrong and all the server->client video streams are receiving the REMB message, then all of them will use that same value which will be higher than a calculated by the loss-based congestion control for each stream independently, so in effect they will all be falling back to loss-based congestion control (when there is more than 1 video stream; for 1 video stream it probably works fine).

Either way, it appears that each server->client video stream is independently running a loss-based congestion controller, all of which will be battling each other (like N TCP streams do). That can work, I guess, but it's better to run one congestion controller and then divide that bitrate among the various video streams, which is what I meant by "bitrate allocation".

In other words, selecting video layers to send is exactly what I mean by bitrate allocation. Sorry for being unclear about that. The code you linked to is estimating the client->server bitrate for a given video stream. What I was looking for is code that will take a bitrate (from one congestion control mechanism, whatever it may be) and then divides that between the various video streams that flow server->client by selecting which layers to forward. I couldn't find that, and now I see why: because each video stream has its own congestion controller, and they apparently compete with each other where they are likely all loss-based in practice.

Loss-based congestion control for video conferencing isn't as good as latency-based congestion control because it will cause more latency.

Thanks for pointing out what I was missing. Now that I understand it better, I can see that it would work fine for 1 video stream (when there are only 2 clients in the call), but then likely falls back to loss-based congestion control for more than 2 clients, which will work, but not as well.

If this is accurate, then I'd make a few suggestions for Galene: 1. Use one "maxBitrate" calculation per-"rtpConn" instead of per-"downTrack" and then divide that bitrate between those downTracks rather than doing N such independent calculations. This will avoid the problems from having congestion controllers competing with each other. 2. Feed the REMB value from the receiving client into the unified calculation. Then you'll get the benefit of latency-based congestion control (assuming the client is doing latency-based congestion control). 3. Switch the loss-based mechanism in the server to a latency-based system using something like transport-cc (which I think Pion supports).

1 comments

Ah, I see where your confusion stems from.

Galène doesn't bundle multiple streams in a single PeerConnection: it puts each stream (audio+video) in its own PeerConnection. Thus, if we can assume that the audio traffic does not significantly contribute to congestion, then performing congestion control per-track or per-connection is exactly equivalent.

(It's a tradeoff. Bundling reduces the amount of ICE traffic and makes for faster connection establishment, but it ends up putting multiple unrelated streas into a single transport-layer flow, which confuses traffic shapers and AQMs. I'm betting that things like fq-codel are being deployed as we speak; if my bet is wrong, then bundling will turn out to be the better choice.)

Oh, that's interesting. That's not as bad as I thought, although I am surprised that's the approach that was taken (unbundled).

In that case, if you're relying on REMB from WebRTC with multiple PeerConnections, then, yes, each will be handled separately and you'll likely not end up with loss-base congestion control but will be using the latency-based congestion control WebRTC is going.

Some potential problems I can think of with this approach:

- It won't scale to a large number of streams. I'm not sure at what point it would cause problems, but I'm guessing it would work OK for... 20? 40? Most of the time, the extra ICE traffic and per-connection RTCP is probably small. A long time ago, there were issues with per-PeerConnection memory usage, but I think we (back when I worked on the WebRTC team) fixed them to a reasonable degree, so it should work for 20-40. You have to do a bunch of DTLS handshakes, but you could get around that by using SDES. You'll opening a lot of NAT bindings, potentially at the same time. Again, with a smaller number of streams, this might be fine. But at some point, you may cause issues with consumer-grade NATs.

- The WebRTC receive-side congestion controllers (sending back REMB) will effectively be competing with one another. I'm not sure how well they work with a large number doing so (or even a small number.) I'd be interested to hear how many you can have running at the same time before you notice problems.

- You can't easily prioritize one stream over the others if you want an "active speaker" view. I suppose you could something where you "steal" bits from one PeerConnection and give it to another, but that will probably contradict what you're trying to do with playing nicely with traffic shapers and AQM.

I'd be very interested to hear your findings with how traffic shapers and AQM respond to non-bundled traffic vs bundled traffic. What characteristics do you seek work better or worse? Higher rates? Lower latencies? Lower jitter? Less loss? And how do you test (given that network behavior can vary so widely one network and the next)?

If you'd like to talk more directly I'm "peter at signal.org".

It looks like we're now understanding each other.

> It won't scale to a large number of streams.

Galene was designed for lectures and conferences, where a small number (1-5) of streams are sent to hundreds or thousands of receivers (and the budget is virtually nonexistent, because teaching and public research are eternally underfnded). It works beautifully for that particular application. It also happens to work well for medium-size meetings (25 senders, 50 receivers), which is a nice bonus, but not what the software was designed for.

> I'd be interested to hear how many you can have running at the same time before you notice problems.

We've been doing staff meetings with 25 senders and 50 receivers (50 people attending, half of which have their camera switched off). However, our main goal is supporting large lectures (2 flows distributed to hundreds of students, with students only switching their camera on in order to ask a question or show their cat): the commercial offerings work reasonably well for meetings, but are completely inadapted to lecturing.

> You can't easily prioritize one stream over the others if you want an "active speaker" view.

Oh, that. We simply forcibly switch the background streams to the lowest spatial layer. It works very well, but relies on the senders implementing either simulcast or SVC. Which is something we may safely assume (all desktop browsers implement at least simulcast, and lecturing is done with laptops).

If you're main focus is to mostly receive 1-2 video streams while sending 0, then, yeah, I guess bundle vs non-bundle doesn't matter much. I'm pleasantly surprised to hear that 25 PeerConnections work well for you.

About simulcast: why would a sender not support simulcast? Something to do with VP8 hardware encoders?

I'm still interested to hear what you have learned about AQM and the like.