Hacker News new | ask | show | jobs
by digitallis42 641 days ago
I did a skim and didn't see any explanation of why one would want it over TCP. Did I miss, or is it non obvious?
2 comments

From a cursory look:

- looks dead simple

- no IP layer (there's a ttpip folder in that repo though)

- distributed congestion control (TCP has a "window" field + a bunch of tentative RFCs, this has a purposeful "congestion")

- 100% implementable in hardware (TCP can, but it's complex)

Not a general TCP replacement, but the README properly highlights a "many endpoints local link" use case:

> the protocol executed entirely in hardware and deployed to a very large multi-ExaFlops (fp16) supercomputer with over 10s of thousands of concurrent endpoints. This protocol does not need a CPU or OS to be involved in any way to link and execute.

In Tesla's presentation slides, "Tesla Transport Protocol Over Ethernet (TTPoE): A New Lossy, Exa-Scale Fabric for the Dojo AI Supercomputer", they mentioned that the network layer is optional (but not removed)
Resume-driven engineering.
I think it's better to think of it as a fibre channel protocol rather than TCP. It's intended for use on managed internal data centre networks. It skips OSI layers to gain speed and probably do 100% hardware routing with FPGAs.

It's of no interest on the internet or any small scale netwwork.

> fibre channel protocol

Apart from FC is is explicitly lossless and ordered

FC is not entirely lossless. One ticket I had the joy of dealing with involved a customer using a Fibre Channel network for their storage using multipathd for failover. In theory it was a fully redundant configuration with dual FC ports on the server with each one going to a different FC switch all the way back to the SAN. However, the system was generating I/O errors on large writes while small writes would succeed. Needless to say that ext4 failed horribly, and there were worries that it was a kernel bug in the FC driver.

After a good amount of back and forth with the customer, and several test programs run on the system in question, I eventually came up with a hypothesis that there was an error in the write path of the SAN as small writes succeeded while larger writes failed. The customer ultimately found there was a dirty fibre on one of the links in their FC fabric. It was dirty enough to corrupt large packets, but not so dirty that smaller writes and control packets were unable to get through. Since multipathd only checks to see if a given target can be read from, it would never fail over to the other path (which was fine). So much for trying to build a high availability system using an expensive SAN!

Lesson of the story: what you think is a lossless network is not always lossless. Using the IP stack has a lot of beneficial diagnostic tools that you really start missing when something goes awry in a non-IP network.

Broken hardware does not make the protocol lossy. I think you're misunderstanding what 'lossless' is intended to mean in this context; it does not mean that it is error-free. In a lossy protocol, missing data is not necessarily an error. In a lossless protocol, missing data is treated as an error, which is consistent with what you experienced.
I do understand what lossless means. The point of my anecdote is a tale of warning that when going off and start designing new network protocols, especially one as bare bones as TTPoE, you need to consider what happens when someone has to deal with things going wrong. Diagnostics and maintenance matter in the real world for people running large systems with thousands or millions of moving parts. IPv4 and IPv6 bring along lots of tools that help in these scenarios, and IPv4/v6 headers don't actually have all that much overhead to parse and generate in hardware, plus they are protocols that have been around long enough to have many widely available hardware and software implementations in open source or to be purchased from vendors. I'm certain that there will be times when sysadmins will be cursing the fact that the folks who implemented TTPoE didn't have a ping-like tool available from the start.
FC remains a lossless protocol; bugs in multipathd just mean we live in an imperfect world. Your initial sentence, "FC is not entirely lossless," conflates a specific networking term of art with a pedantic application of the denotative definition of the word. If your point was that immature network technologies do not have as many diagnostic tools as mature ones do, you should have made that point instead of misappropriating jargon.

Anyway, to your specific point, IP at all is basically overkill in a cluster architecture. Very few IP stacks function properly without having to get things like ARP involved; the more of this stack you can get rid of, the better performance you get and there's less to maintain. TTPoE reminds me the most of ATA over Ethernet, a previous effort to shed the complexity of a protocol designed for global networking. It worked great until you hit scaling issues, which competing tech leveraged the aforementioned complexity to address.

FC should be able to detect errors. I've had alerts shout at me when an FC switch detects a dropped packet.

More over, the multi-path should have stopped that! it should have detected a bad link and failed over to the other one (but the config for that is hard, so I can see why that might not worked. )

Last time I checked, multipathd does not and cannot detect faults on the write path as it only performs small reads to check the health of any given path. Checking writes would involve allocating space on the disk for multipathd to safely write to. Maybe someone has changed that in the past decade? I don't know as I'm not involved in anything SAN related anymore (and thank goodness for that!). SAN hardware is particularly awful as the underlying network is essentially hidden from the operating system most of the time. Storage subsystems built 30 years ago were built without any consideration that they might running on top of networks.

These and many other performance issues left me with a particular hatred of SANs.

yeahnah, I totally feel your pain.
Elon just doesn't want to pay Nvidia for Infiniband. Lol
If it works and it's cheaper, this is a very reasonable thing to do.