Hacker News new | ask | show | jobs
by ohnoesjmr 794 days ago
I've heard about MPTCP back in 2013.

It made so much sense back then, when mobile apps were not that robust to networks changing, I assumed it's going to get adopted in no time due to how much of a ux improvement it would have been back in the day.

It's incredibly depressing that this gained barely any traction in the last 10 years, and kernel options are appearing just recently, after everyone has wrapped they http calls in multiple retry handlers, and mobile operating systems have abstracted network connectivity to the point where it feels more like you are using zeromq rather than tcp.

6 comments

I suspect that a lot of innovation energy moved to QUIC, because with TCP your nice new variant can be randomly nobbled by middleboxes. For example, see https://blog.apnic.net/2021/12/08/efficient-multipath-transp...
QUIC is a step backwards here; it has no multipath support: https://lwn.net/Articles/964377/

Multipath: There are several areas where TCP still has an advantage over QUIC. One of those is multipath support. Multipath TCP connections can send data on different network paths simultaneously — for example, sending via both WiFi and cellular data — to provide better throughput than either path permits individually.

Server connection migration is explicitly forbidden by QUIC:

https://github.com/quicwg/base-drafts/pull/2031

The draft multipath extension is here: https://datatracker.ietf.org/doc/draft-ietf-quic-multipath/

It's on the standards track, rather than experimental, so likely to be supported once finished. There seem to be some implementations, including Apple:

https://github.com/quicwg/multipath/wiki/QUIC-Implementation...

QUIC is a step back, IMHO. Especially, given how many national networks work poorly with UDP protocols.
If QUIC adoption grows, that will motivate network providers to improve UDP performance and connectivity
Nobody cares about those middleboxes, those are only relevant in corporate networks.
Such middleboxes can also be seen in cellular networks. (And firewall in free access points / guest networks)
>Such middleboxes can also be seen in cellular networks.

Complain to your ISP if they mingle with a layer they are not supposed to mingle with.

>(And firewall in free access points / guest networks

I consider those as "corporate".

I wanted to like it, and Apple included it in iOS, but supporting it on real servers was going to be too hard...

When I was deployed on FreeBSD with no load balancers, there weren't recent patches. And even if there were, I'd need to do some serious work to avoid advertising the private network ips as alternates...

When I was on Linux behind a load balancer, it's too complex to get the streams to the right place. And the load balancer doesn't want to do it anyway.

Processing two streams together involves a lot of complexity in a high throughput code path. It's a lot of risk, and you've got to reboot for changes.

And then you do all that work and it only benefits iOS users, who tend to be on better networks anyway.

Apple also contributed[1] MPTCP support to Envoy Proxy.

[1]https://github.com/envoyproxy/envoy/pull/18780

> iOS users, who tend to be on better networks anyway.

I don't think there is any basis to claim that.

> A U.S. analysis of Wi-Fi and mobile Internet usage across unique smartphones on the iOS and Android platforms reveals that 71 percent of all unique iPhones used both mobile and Wi-Fi networks to connect to the Internet, while only 32 percent of unique Android mobile phones used both types of connections. A further analysis of this pattern of behavior in the U.K. shows consistent results, as 87 percent of unique iPhones used both mobile and Wi-Fi networks for web access compared to a lower 57 percent of Android phones.

https://www.comscore.com/lat/Prensa-y-Eventos/Infographics/i...

Since wi-fi networks tend to be higher quality than cell networks, what you provided works against the point I responded to.
Someone paying for a premium phone is probably also inclined to pay for a premium mobile network.
Not all iPhones are 'premium' phones, and there are not really 'premium' cell phone networks in the US. Or anywhere.
Lol, have you ever been to Europe? iPhones are definitely considered premium and there definitely are networks that are more expensive but offer better reception. In Germany, that would be Telekom, in Switzerland, it's Swisscom.
Yes used to live in Germany. I was talking mainly about the US though.

iPhone isn't always 'premium', since they have their version of cheap phones as well. Point is cell network service quality is independent from phone quality.

A lot of this is bought in instalments isn’t it?
It sounds like this would have taken off if it were added to various managed cloud load balancers based on what you're saying.

The only question I have is if it opens up a different can of worms even if you've got a magic box terminating layer 7 for you or not. Never dug deep enough into mptcp myself to know.

I think it's a no brainer if it's no effort or small effort (set a socket option on the client, somehow)... but it's a big effort to support it in a large load balancing situation.

If you balance your load balancers with ECMP, I don't know if you can get two client streams to the same mptcp terminating place.

If you've optimized the heck out of your tcp flows, this throws a wrench in there, because the second stream is likely to get hashed into a different nic queue, and then you have communication between cpus to move forward on the logical stream.

It would have been really handy though, and solve real issues with real users.

Edit to add: it could also solve some issues on private networking / interserver networking I saw... although the contention would be a much bigger problem on higher bandwidth streams. On networks with link aggregation, while there are many paths from one host to another, usually path selection is by hashing the connection 5-tuple {src ip, dst ip, protocol, src port, dst port} so a long running tcp connection remains on the same path for the duration, if a path segment has high loss/corruption or is congested, MPTCP could help if you had an extra connection that hit a different path. Otherwise, you need to find the segment and get network operations to fix it; it's not easy to figure that out (i had to write a tool to sample and find port combinations with trouble and then a patch for mtr to run a trace with fixed ports) and then you still need to reconnect your affected tcp sockets unless you can get a quick response from net ops (sometimes they can check error stats once the right devices are pointed out to them, and then replacing a cable/fiber often helps, or disconnecting it during investigation can help the traffic flow across the redundant links)

> If you balance your load balancers with ECMP, I don't know if you can get two client streams to the same mptcp terminating place.

At Google, we do something similar with QUIC and connection migration. Our mechanism for ensuring these hit the same backend is Maglev [0], where we use the QUIC connection ID for hashing purposes in software. (Our routers still mostly use ECMP based on the 5-tuple, so being able to consistently hash to the same backend across multiple LB instances is crucial.)

> if a path segment has high loss/corruption or is congested, MPTCP could help if you had an extra connection that hit a different path.

Incidentally, we also have a family of internal mechanisms that do this, although we don't rely on MPTCP. (We instead twiddle some other bits in the packet that we make sure our routers use for hashing, at least for RPCs between prod machines.) This inspired some of the connection migration work in our QUIC implementation [1], wherein we can migrate to a different ephemeral port if we detect issues with the current path. This works shockingly often for routing around network problems.

[0] https://research.google/pubs/maglev-a-fast-and-reliable-soft...

[1] https://github.com/google/quiche/blob/main/quiche/quic/core/...

> I've heard about MPTCP back in 2013.

> I assumed it's going to get adopted in no time due to how much of a ux improvement it would have been back in the day.

You might also be interested in SCTP[1] from the year 2000, which also hasn't gotten any traction so far.

[1]: https://en.wikipedia.org/wiki/Stream_Control_Transmission_Pr...

> You might also be interested in SCTP[1] from the year 2000, which also hasn't gotten any traction so far.

Probably partly because middleware boxes (e.g., firewalls) either didn't/don't support it and/or rules were written to only support "TCP" (as opposed to 'stream') or "UDP" (as opposed to 'dgram'; see also "DCCP").

Certainly that's a part, but it didn't help that SCTP has some fundamental low-level flaws.

Given that TCP also has at least one unfixable flaw, the only recommendation I can make is to use something UDP-based - which, to make sure you don't stomp on everybody else's traffic, means use the only popular one: QUIC (the layer beneath HTTP/3).

The protocol is specified by a byte in the IP packet; how many middleware boxes block everything except for ICMP, TCP, and UDP? What is the probability that a packet with that byte set to something unexpected actually gets from source to destination?
The “funny” thing is that http3 really really looks like a transport protocol encapsulated into… uso. Exactly because many middle boxes block anything that’s not a very well known protocol
The internet is just broken and only works because of lot of hacked bandaids.
> The protocol is specified by a byte in the IP packet; how many middleware boxes block everything except for ICMP, TCP, and UDP?

Most firewalls are default deny out of the box and you have to allow things through. How many folks bother opening up SCTP/DCCP/etc?

How does sctp work with NAT that your typical home box uses?
SCTP can run over UDP. It's part of the spec.

Now we have HTTP3 which runs over UDP - where there is a will, there is a way.

Perhaps SCTP was ahead of its time.

> SCTP can run over UDP. It's part of the spec.

SCTP over UDP came out in 2013:

* https://datatracker.ietf.org/doc/html/rfc6951

SCTP came out in 2000:

* https://datatracker.ietf.org/doc/html/rfc2960

Over a decade is quite a while in Internet-time.

SCTP is used a lot inside telco networks for carrying switching control metadata for voice connections. https://en.wikipedia.org/wiki/SIGTRAN
WebRTC data channels use SCTP, which ain't nothing! https://datatracker.ietf.org/doc/html/rfc8831

(SCTP over DTLS, that is...)

I was excited about it because we were working on delivery robots and I wanted a good solution for instant failover given 2 cellular modems.

We ended up going with PepLink's SpeedFusion to save engineering time. But the license was costly. I really hope for a free solution in the future for 2 cellular networks and <50ms failover.

Multipath UDP + OpenVPN would also probably be a viable solution.

I created something like what you're describing with the addition of P2P communication using NAT traversal (https://www.hyperpath.ie)

It will connect your devices in a P2P Mesh VPN and allow them to send and receive data using multiple links (e.g. multiple 5G or 5G + Satellite).

It is significantly cheaper than Peplink's license, less latency and no bandwidth / data limits.

You need to bring your own hardware though. Like a Raspberry Pi with 3 USB 4G/5G dongles.

what about something like this? two minipcie slots which i suppose you could put two cellular modems into. not sure what OS it runs though but presumably some flavor of linux.

https://mikrotik.com/product/rbm33g#fndtn-specifications

maybe someone could make one that uses an RPi compute module instead.

It looks like it runs "RouterOS" which has a Linux kernel, so it should be possible to run it there.

I found this board on AliExpress (https://www.aliexpress.com/item/1005003540616473.html?spm=a2...) based on the CM4 and with 3 cellular modems. That could also be a good candidate.

Also found this one (https://www.gateworks.com/products/industrial-single-board-c...) US-made and has 3 minipcie slots (other options available with 2 and 4)

Hehe, I also worked on a delivery robot with exactly the same problem. We ended up licencing phantom auto. Expensive and ... Not particularly amazing.
How was the connectivity with Phantom Auto?
I see it as depressing that this is gaining traction it doesn't deserve. TCP doesn't need one hack at a time and then to make us choose combinations that sort of work in half the use cases in the modern world, it needs to be replaced with SCTP.
ZeroMQ yeah!