| > This isn't an adequat comparison in my eyes. Your example moves a very critical component out if the "protocol logic": message parsing. (for context in case you didn't see the link in the other comments - https://gist.github.com/joshka/af299be87dbd1f64060e47227b577... is the full working implementation of this). It uses the tokio framed codec approach where message parsing effectively only succeeds once the full message is parsed. For stun that doesn't really seem like a big deal, but I can imagine that for other protos it might be. > Dealing with invalid messages is crucial in network protocols, meaning the API should be `&[u8]` instead of parsed messages. The stream is a stream of Result<ParsedMessage, ErrorMessage>, that error message is able to represent the invalid message in whatever fashion makes sense for the protocol. The framed approach to this suggests that message framing is more important to handle as a more simple effectively linear state machine of bytes. Nothing is stopping you from treating the smallest unit of the protocol as being a byte, and handling that in the protocol handler though, still just comes down to being a stream transformation from the bytes to a stream of messages or errors.. Additionally, when working with the stream approach, there's usually a way to mutate the codec (e.g. UdpFramed::codec_mut()) which allows protocol level information to inform the codec how to treat the network byte stream if necessary. > In addition, you want to be able to parse messages without copying, which will introduce lifetimes into this that will make it difficult to use an exexutor like tokio to run this future. You can use a structured-concurrency approach instead. I linked to an article from withoutboats about this at the end. The Bytes crate handles zero-copy part of this orthogonally. Lifetimes aren't necessary. PoC on this is to just add a bytes field and push the ref to the bytes representing the stun response like: let bytes = data.split_to(12).freeze();
Ok(Some(BindingResponse { address, bytes }))
> Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little? > EDIT: Just to be clear, I'd love to use the Rust compiler to generate these state machines but it simply can't do what I want (yet). I wrote some pseudo-code somewhere else in this thread of how this could be done using co-routines. That gets us closer to this but I've not seen any movement on the front of stabilising coroutines. I wonder if you'd be able to expand on an example where these constraints you mentioned above come up. The Stun example is simple enough that it is achievable just with an async state machine. I'd be curious to work through a more detailed example (mostly for interests sake). In the article you mention the runtime overhead of mutexes / channels. I'm curious what sort of percentage of the full system time this occupies. |
Fair point! Yeah that would allow error handling the state machine.
> > Lastly, and this is where using async falls apart: If you try to model these state machines with async, you can't use `&mut` to update shared state between them AND run multiple of these concurrently.
> I'm not sure that I see your point here. I think it's just you never have to think about concurrent access if there's no concurrency, but I'm sure I'm misunderstanding this somehow. Maybe you could expand on this a little?
Yeah sure! If you are up for it, you can dig into the actual code here: https://github.com/firezone/firezone/tree/main/rust/connlib/...
The requirements are:
In theory, (3) could be relaxed a bit to not be a NxM matrix. It doesn't really change much though because one TURN allocation (i.e. remote address) needs to be usable by multiple connections.All of the above is managed as a single `Node` that is composed into a larger state machine that manages ACLs and uses this library to route IP packets to and from a TUN device. That is where the concurrent access comes in: We receive ACL updates via a WebSocket and change the routing logic as a result. Concurrently, we need to read from the TUN device and work out which WireGuard tunnel the packets needs to go through. A WireGuard tunnel may use one of the TURN clients to relay the packet in case a direct connection isn't possible. Lastly, we also need to read from the UDP socket, index into the correct connection based on the IP and decrypt incoming IP packets.
In summary, there are lots of state updates happening from multiple events and the state cannot be isolated into a single task where `&mut` would be an option. Instead, we need `&mut` access to pretty much everything upon each UDP packet from the network or IP packet from the TUN device.
> The Stun example is simple enough that it is achievable just with an async state machine.
I am fully aware of that. It needed to be simple to fit into the scope of a blog post that can be understood by people without a deep networking background. To explain sans-IO, I deliberately wanted a simple example so I could focus on sans-IO itself and not explaining the problem domain. With the basics in place, people can work out by themselves whether or not it is a good fit for their problem. Chances are, if they are struggling with lots of shared state between tasks, it could be!