Hacker News new | ask | show | jobs
by hyperhopper 603 days ago
I wish the article explained how it dealt with message loss from the at-most-once redis pub/sub channel
3 comments

Indeed, it does deal with the message loss. I was momentarily confused because in my many thousands of bullet chess games on Lichess I haven't had much of any message loss that can be attributed to Lichess's servers (but plenty when my Internet connection is down or unstable).

I will have to take a look, because whatever it's doing, it works very well!

The at-most-once delivery could be an issue if lichess's backend services (lila or lila-ws) crash. Presumably this a rare enough occurrence that message loss is more of a theoretical concern.
I have no idea, but the in-house pub/sub tech at a previous job used [PGM][1] together with some hand-written brokers and a client library. The overall delivery guarantee is at-most-once, but in over ten years and across tens of thousands of machines in multiple datacenters, they never saw a single dropped message. Not sure how they measured that, but I was told the measurements were accurate.

Well, except for that one major outage where everything shit the bed due to some misconfiguration of IP multicast in the datacenters, or so I was told.

So, maybe if your mission isn't life critical, you can just wrongfully assume exactly-once delivery.

[1]: https://en.wikipedia.org/wiki/Pragmatic_General_Multicast

I was hoping for that too, that's the kind of interesting architectural question I wanted this article to answer.