Hacker News new | ask | show | jobs
by neuralkoi 43 days ago
I can see why they rewrote QUIC in Rust and for use in userspace, though going the in-house approach would warrant keeping an eye on the relevant kernel commits like a hawk to avoid missing bug fixes like these. These in-house implementations tend to have less eyeballs than the kernel.

I found it interesting that Cloudflare is not yet using BBR as the default in quiche. CUBIC's recovery in this day and age, and especially in datacenters with large pipes, seems so slooow to me. Almost two seconds with no loss whatsoever till achieving BDP again and then shooting itself in the foot every time it hits the ceiling. Each one of those losses a retransmission.

2 comments

> though going the in-house approach would warrant keeping an eye on the relevant kernel commits like a hawk to avoid missing bug fixes like these. These in-house implementations tend to have less eyeballs than the kernel.

This is somewhat funny to read because this specific issue in CUBIC (sudden CWND jump upon existing quiescence) was originally discovered in Google's QUIC library and then later reported to the team working on the TCP stack. I know this because I was the one who found that bug back in 2015.

That said, congestion control algorithms are really prone to logic bugs, and very subtle changes in the algorithm can often lead to dramatically different outcomes. Because of that, there's a lot of value in running congestion control code that has been tested on a wide variety of real Internet traffic.

Would formal validation of these algorithms (e.g. with TLA+) help avoid such bugs?
I thought there was in the past and some of the flaws found were addressed in facebooks version of this

https://doi.org/10.1145/3452296.3472912

Toward formally verifying congestion control behavior | Proceedings of the 2021 ACM SIGCOMM 2021 Conference

I think a audited algorithm where each type is strictly defined like int32 added to that really help with what exactly should be inputted to it so it remains correct.
> I can see why they rewrote QUIC in Rust and for use in userspace

As far as I know, while they might have either way, they did not ("rewrite QUICK [...] for use in userspace"): the linux kernel implementation only landed late 2025. Quiche was started ca 2018 (that's when Cloudflare started beta-deploying QUIC, the first public alpha of quiche was january 2019).

I don't know that there even was an in-kernel implementation of quic before msquic.sys which I believe first shipped in Server 2022 circa mid 2021 (and is used as the implementation backend by MsQuic on Server 2022 and W11).

I think the original commenter confused taking the CUBIC implementation from the kernel and rewriting that in Rust for use in their QUIC implementation or they just jumbled their wording. It does make sense to use an existing battle tested implementation of a congestion algorithm because there are potential many real world failure modes that you might not anticipate if you try and write an implementation from scratch.
Yes, I meant CUBIC implementation! But I'm glad I made the mistake-I learned some interesting things from the responses above.