Hacker News new | ask | show | jobs
by Matthias247 3482 days ago
There's another important question: How will the clients deal with the fact that they did not get a notification delivered? Will that mean they probably never receive a chat message? That could in some cases be catastrophic for the user. Or would it only mean that they may not get something instantly, which would not be too bad if the client would also poll the server or also try to catch up on notifications on reconnects.
2 comments

When the push notifications hit FCM, Firebase do not guarantee delivery of those messages to clients (usually iOS or android devices). There are quite a few reasons that FCM/APNS might fail to deliver a message, so applications almost never have functionality depend on them.

As you say, you might not get the notification pushed to the device, but you should still see the message if you open the messaging app as normal.

This is indeed the case. Our real time system is outside of firebase and APNS and it handles the actual real time updates of chat state once the app is launched. We also have a delivery system that accounts for network cuts/switches and the like.
Sounds like delayed delivery of messages?

If the buffers are filling faster than the servers can clear them, then you're headed for "catastrophic" failure anyway and notifications are going to get dropped regardless. You can handle it more or less gracefully while you spin up more capacity.

The other choice is to always keep around more spare capacity, but that can get expensive if as they describe these drastic peaks happen once a month.

I like to differentiate into 2 categories of "catastrophic": The one you mention is that the service can't keep up with the demand and that is has to take some actions to stay alive and not crash. That's and important thing which should be incorporated into the design.

The other category which I meant is that in such situations the system should not run into inconsistent/weird behavior which is catastrophic from the end user point of view (not the system). E.g. in a chat application if user A sends a message to B and on his client he gets an acknowledgement that the message has been delivered. However if the server under high load simply drops the message before forwarding it to B that user might never get it. A sees that something was delivered while in reality it was not. If A depends on that information it surely might be catastrophic to him.

All in all you should have a complete system design that even under high pressure works deterministically for the users. E.g. user A only gets an ACK that the message was sent after it was somehow persisted on the server. And if the server can't deliver the message to the other client because it was dropped it is still marked somewhere for retry later on. Or it will get fetched at a later time by the client through some poll operation.