| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hnedeotes 1917 days ago

That link is talking specifically about the bare send (!) when in a distributed setting. Erlang guarantees locally (in the same VM) and guarantees ordering as well locally (ordering when done in a synchronous block, if you have a single process do send msg1 followed by send ms2, locally it's guaranteed that msg1 will be "received" first, if the receiver is alive). Outside the same VM even if in the same host it can fail (someone closed the socket the other VM is on for e.g).

Erlang also bakes OTP, that is a library for messaging semantics and process behaviours (that processes have to implement to be OTP) and introduces the concept of a "call", where a unique reference is created for the message being sent and only when the receiver processes the message and "replies" (with the answer and the reference) is the "call" considered complete and allows to be sure the message was processed to the point of sending that reply. This is the solution the "ack" mentioned in the linked doc refers to. It's not inherent in send because send is async and the only way to have it know that, is to wait for an ack.

(you can implement the call semantics with plain processes, but it's such a normal thing that in OTP it's baked at a lower level for the process behaviours included in OTP, mostly all the gen_* behaviours)

All of this breaks down in distributed settings because it's physically impossible to guarantee. Your message may be received but the answer back may not because the network glitched or the hardware blew before the response was sent. These are problems of distributed systems though. You can be sure that if you get a reply from a call that the message was received. You still need to take (or not) care of some of the failure modes accordingly to your requirements (be it having idempotency, retry logic, nodes behaving as queue processors, etc). Some failure modes encode the reason as well, for instance a failed call to a non-existent pid in a functioning node that is reachable is different than a failed call to a non-reachable node, but a failure in a node that went down or a node that is alive but not reachable is impossible to discern without additional things.