Hacker News new | ask | show | jobs
Tesla Transport Protocol over Ethernet (TTPoE) (github.com)
167 points by super_linear 637 days ago
11 comments

Previous discussion from ~month ago, "Tesla’s TTPoE at Hot Chips 2024: Replacing TCP for Low Latency Applications":

* https://news.ycombinator.com/item?id=41374663

* https://chipsandcheese.com/2024/08/27/teslas-ttpoe-at-hot-ch...

Tesla seeks to work to standardize a new high-speed/low-latency fabric (be that TTPoE or otherwise) for AI/ML/Datacenters however theres nothing inherently abject about TCP as it exists today. RDMA over Converged Ethernet suffices perfectly well for whatever an "AI/ML/Datacenter" is and if we're being fair, the lackadaisical approach to the documentation suggests that they may not be taking it as seriously as they could anyway.

If Tesla were really seeking to shake things up they wouldnt have picked IPv4 to do it when the newest release has been around for nearly 30 years and has latency reduction baked in.

this smacks of a pandersome attempt from a company that sees the quite mandarin writing on the walls and has decided (in true Muskovite fashion) they too are just a misunderstood font of futurism.

RoCE sends huge packets down the wire.

TCP has the wrong abstraction for truly high performance.

I wouldn't necessarily standardize what Tesla does here, but most of the big companies have their own layer 3 transport protocol for things that need truly high speed and are operating within a datacenter.

Cray/HPE has their own Ethernet-based protocol (Slingshot was an earlier version of it - not sure what its name is now) which seems to be better than whatever Tesla has, but is not necessarily published.

I did a skim and didn't see any explanation of why one would want it over TCP. Did I miss, or is it non obvious?
From a cursory look:

- looks dead simple

- no IP layer (there's a ttpip folder in that repo though)

- distributed congestion control (TCP has a "window" field + a bunch of tentative RFCs, this has a purposeful "congestion")

- 100% implementable in hardware (TCP can, but it's complex)

Not a general TCP replacement, but the README properly highlights a "many endpoints local link" use case:

> the protocol executed entirely in hardware and deployed to a very large multi-ExaFlops (fp16) supercomputer with over 10s of thousands of concurrent endpoints. This protocol does not need a CPU or OS to be involved in any way to link and execute.

In Tesla's presentation slides, "Tesla Transport Protocol Over Ethernet (TTPoE): A New Lossy, Exa-Scale Fabric for the Dojo AI Supercomputer", they mentioned that the network layer is optional (but not removed)
Resume-driven engineering.
I think it's better to think of it as a fibre channel protocol rather than TCP. It's intended for use on managed internal data centre networks. It skips OSI layers to gain speed and probably do 100% hardware routing with FPGAs.

It's of no interest on the internet or any small scale netwwork.

> fibre channel protocol

Apart from FC is is explicitly lossless and ordered

FC is not entirely lossless. One ticket I had the joy of dealing with involved a customer using a Fibre Channel network for their storage using multipathd for failover. In theory it was a fully redundant configuration with dual FC ports on the server with each one going to a different FC switch all the way back to the SAN. However, the system was generating I/O errors on large writes while small writes would succeed. Needless to say that ext4 failed horribly, and there were worries that it was a kernel bug in the FC driver.

After a good amount of back and forth with the customer, and several test programs run on the system in question, I eventually came up with a hypothesis that there was an error in the write path of the SAN as small writes succeeded while larger writes failed. The customer ultimately found there was a dirty fibre on one of the links in their FC fabric. It was dirty enough to corrupt large packets, but not so dirty that smaller writes and control packets were unable to get through. Since multipathd only checks to see if a given target can be read from, it would never fail over to the other path (which was fine). So much for trying to build a high availability system using an expensive SAN!

Lesson of the story: what you think is a lossless network is not always lossless. Using the IP stack has a lot of beneficial diagnostic tools that you really start missing when something goes awry in a non-IP network.

Broken hardware does not make the protocol lossy. I think you're misunderstanding what 'lossless' is intended to mean in this context; it does not mean that it is error-free. In a lossy protocol, missing data is not necessarily an error. In a lossless protocol, missing data is treated as an error, which is consistent with what you experienced.
I do understand what lossless means. The point of my anecdote is a tale of warning that when going off and start designing new network protocols, especially one as bare bones as TTPoE, you need to consider what happens when someone has to deal with things going wrong. Diagnostics and maintenance matter in the real world for people running large systems with thousands or millions of moving parts. IPv4 and IPv6 bring along lots of tools that help in these scenarios, and IPv4/v6 headers don't actually have all that much overhead to parse and generate in hardware, plus they are protocols that have been around long enough to have many widely available hardware and software implementations in open source or to be purchased from vendors. I'm certain that there will be times when sysadmins will be cursing the fact that the folks who implemented TTPoE didn't have a ping-like tool available from the start.
FC should be able to detect errors. I've had alerts shout at me when an FC switch detects a dropped packet.

More over, the multi-path should have stopped that! it should have detected a bad link and failed over to the other one (but the config for that is hard, so I can see why that might not worked. )

Last time I checked, multipathd does not and cannot detect faults on the write path as it only performs small reads to check the health of any given path. Checking writes would involve allocating space on the disk for multipathd to safely write to. Maybe someone has changed that in the past decade? I don't know as I'm not involved in anything SAN related anymore (and thank goodness for that!). SAN hardware is particularly awful as the underlying network is essentially hidden from the operating system most of the time. Storage subsystems built 30 years ago were built without any consideration that they might running on top of networks.

These and many other performance issues left me with a particular hatred of SANs.

Elon just doesn't want to pay Nvidia for Infiniband. Lol
If it works and it's cheaper, this is a very reasonable thing to do.
There was a talk about this prior. This was used in place of TCP, but where TCP is designed to run over unreliable networks, this protocol achieves speed and latency figures comparable to others, while still being able to retain commodity IP switches in the cluster. By having a fixed buffer, no lingers, faster opens, they increase the speed and latency, without going to dedicated vendors or other stacks.
> they increase the speed and latency

I suppose you mean "increase the speed and decrease the latency"?

Yes. Typo.
Be interesting to see how this stacks up to the dominant protocol in supercomputers/ai clusters : Infiniband.
AFAICT this is very much about handling unreliable links and congestion control.

Infiniband instead makes the sides bargain to avoid packet loss, while the medium is supposed to be reliable.

> Be interesting to see how this stacks up to the dominant protocol in supercomputers/ai clusters : Infiniband.

As mentioned in README, this was submitted to the larger Ultra Ethernet consortium for consideration:

> Deliver an Ethernet based open, interoperable, high performance, full-communications stack architecture to meet the growing network demands of AI & HPC at scale

* https://ultraethernet.org

I thought infiniband was more expensive and that even AI where bandwidth is super important was trying to get away from it towards cheaper options.
Is it missing a license?
The header files all say it's GPL 2. But yes, they should have a license file at the top level.
How is this better than UDP? Or for that matter, just plain old Ethernet MAC addressing? You can achieve lower latency and speed (than this) if you don't care about reliability in your transport layer.

This reaks of NIH.

I worked with a company that wrote its own protocol for Ethernet and got almost wire speed. It was worth it for 10, but not worth it at 100mbps.

You can always beat general purpose solutions like the TCP/IP/UDP stack if you try. For most it isn’t worth it.

Did you even try reading the README?

- TTPoE is designed to be implemented at hardware level unlike UDP

- UDP cannot guarantee transmission whereas this does

- TTPoE is built for distributed resilience

> Some variables may have changed slightly without documentation updates, but we're sure you can figure it out

I hope they're not hoping for mass adoption with an attitude like that. Not exactly inspiring confidence in the longevity and maintainability.

I don’t think mass adoption is their goal. They had a problem. They solved that problem. They shared how they solved said problem.

Every engineering company releases stuff like this. It’s not meant to change the world. It’s marketing to recruit other engineers who would find that problem interesting.

> I don’t think mass adoption is their goal.

I'm not so sure about that.. FRom the repo :

> Tesla also announced joining the Ultra Ethernet Consortium (UEC) to share this protocol and work to standardize a new high-speed/low-latency fabric (be that TTPoE or otherwise) for AI/ML/Datacenters

Also it's a protocol, personally I will only use a protocol that's fully spec'd. It's a pain sometimes to have consensus among all contributors but it's valuable.

> edit : I will only use a protocol that's fully spec'd IN PROD

Nah, it's the base thing that a standard can be built out of. This is how things usually get done.
Yup, for all the specs I've contributed towards, I should have just said : "but we're sure you can figure it out"

That's how things usually get done right ?

At least they're honest about it.

This is currently the state of much modern documentation from huge tech companies.

it feels more like a way to sweep liability away rather than a real warning..

..which also does not inspire confidence.

Why does this not inspire confidence in Tesla. Their internal software stack is available to their own developers who can review what is actually there.

Why does it have to be perfectly documented in a public github? Are all other car companies "properly" publically documenting things in github?

Does it inspire more confidence in VW's software stack if they don't share it? Is VW's confidential stack some big competitive advantage? I've used a VW ID electric vehicle. I did not come away that impressed.

Because Tesla and Musk bad... or something of the sort.

This is the way it goes here in HN for anything related to Musk.

> Because Tesla and Musk bad... or something of the sort.

No, spec bad. Protocol unknown. Poof

edit: > This is the way it goes here in HN for anything related to Musk.

Nobody mentioned Musk ... Except you.

Twice now I’ve been excited that this was for realtime ethernet used in teslas vehicles. Alas, it is not.
Any reason to believe they don't use one of the standard industrial protocols like the poorly named EtherNet/IP?
Licensing probably?

CAN (or one of its more modern variants) are historically more common in automotive. However with 2-wire Ethernet connections becoming more commonplace I do think you're right that more and more cars will be moving to ethernet fieldbus.

EtherNet/IP is not as robust for many applications as its competitors (PROFINET, EtherCAT) since it is not fully deterministic. EtherCAT is my personal favorite.

+1 - ethercat and profinet are the way.

Random guessing - Ethercat seems more likely to take over for CAN because CoE (canopen over ethercat) is so common.

It's very easy to turn CAN devices into ethercat ones.

Harder to turn them into profinet ones.

Seems like a more incremental path for car makers.

otherwise the main advantage of profinet is that you can treat it like regular ethernet (IE switches, etc), but not sure anyone cares in a car.

Please no EIP, its utter crap and designed by an OOP huffing committee. The only serious protocol is EtherCAT with honorable mentions for Sercos 3 and Ethernet Powerlink (CANopen over Ethernet).
Of all the (current) industrial protocols they could have picked, Ethernet/IP would be the worst.

Its only advantage is that it can coexist with other TCP traffic and run over standard switches, but that just results in unreliable fieldbus performance.

Really interesting
Why?
Recreating foundational infra doesnt seem so common, especially for car company
In a sense this wasn't from Tesla the car company, but Tesla the IT department with a supercomputer. I don't know what they do on it though, might be lots of physics simulations (aerodynamics etc) or deep learning for assisted driving tech.
They train an end-to-end model to drive based on 8 camera streams and recorded input from human drivers, training on tens, (if not hundreds now) of millions of 30 second clips from their consumer fleet. That's why they're bought one of the largest GPU clusters and making their own chips and transport protocols.

It's not widely known, but Tesla probably has one of the largest training cluster, because practically all the GPUs they buy go towards training, while most of GPUs for e.g. OpenAI go towards inference. Tesla does inference in the car.

In older interviews Musk said that the Dojo is intended for deep learning.

So most likely that. I agree that this seems to have very little to do with cars.

CAN, MOST, Flexray, LIN, K-Line were all invented for automotive use.

2 wire Ethernet is also a thing that they spearheaded.

[flagged]
Can you please not post like this? Regardless of who you're talking about or how you feel about them, it's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.