Hacker News new | ask | show | jobs
by tdrp 1893 days ago
We've also run an app with 1 million+ users for a few years, and I can confirm that notifications are a wreck on both platforms, but IMO even more so on Android/FCM.

In fact most of our calculations show that we have never gone past 85% delivery of high-pri Android notifications to valid FCM tokens. By high-pri here we mean direct chat messages in an active 1-on-1 chat conversation. This has caused many users to abandon the app since they assume the system to be unreliable. One of the main reasons seems to be that many Android phone vendors have simply whitelisted WhatsApp, Facebook and a couple of famous apps and then dumped every single other app into some kind of "kill if in background and completely forget about it" bucket. In some cases it can be fixed by asking the users to wander deep into the phone settings and toggle some switches. We brought the delivery rate from the low 80s to 85% by literally buying a bunch of the phone models we had heard had issues and trying to repro what was happening then popping up custom instructions for users like "looks like you are running this kind of phone which in general will not deliver notifications, please go to settings -> blah". But the whole thing has been a gigantic game of cat and mouse which really shouldn't be in the hands of individual app developers. In many cases the FCM system does not return any error code.

I'd be curious to hear if anyone here has achieved 95%+ FCM delivery rates on Android apps.

3 comments

Until last week I would have been confused by this, but last week my Pixel 4 broke.

While waiting for it to be repaired, I bought a cheap Chinese phone to use as a backup - not sure of the brand but it cost £200.

I don't know what the fuck they did to the operating system, but notifications are completely unreliable. I think there's possibly some aggressive app termination going on, maybe for battery or memory usage. Either way, most of the time I would only get a notification once I open the app. Some apps seem to be more reliable than others - maybe due to whitelisting as you say.

In hindsight, Google allowing these no-QA phones to use the Android branding was a big mistake.

Killing idle apps, including Services, is a feature in Android to conserve battery life and to give memory to Activities in more recent use. This might not be as noticeable on high-end-phones.

A lot of shitty written apps are not aware of this and think they can run some while loop in the background to poll their own backend 10 times per second. Eventually that will get killed.

The solution to this is the push apis, or for apps that truly need a background loop to present a persistent notification aka "foreground-service", so the user is aware something heavy is going on in the background. Hearing FCM, which is the suggested push solution, having delivery issues above is not very comfortable.

I'm developing an app with critical realtime notifications. Do you mind if I ask you a few questions?

1. Does your 85% delivery rate apply to FCM "notification messages" (displayed to user) or only to "data messages" (processed by app)?

2. Do some devices fail to deliver messages when your app is in the foreground? And does it happen often with US-based devices?

3. When the app is in the background, how long does it take for it to stop receiving notifications? I know the answer must vary by device.

4. How do you determine delivery rate of notification messages? I believe those are handled entirely by Android OS and app code runs only when they are tapped. I'm using Flutter and have not yet dug into the details of how Android processes and displays notifications.

5. Do you use FCM to deliver APNS messages?

6. How reliable are APNS messages to a foreground app for US users I'm considering launching iPhone-only.

My app helps people make appointments (dates). It uses notifications to remind users of their upcoming and imminent appointments. The backend detects when a user has not tapped a critical notification and sends them an SMS. This should reach the user even if they didn't see the notifications or they uninstalled the app without cancelling their appointments.

I also implemented real-time chat using FCM data messages with a slow poll fallback. If data messages are not reliable, then I will need to replace the slow poll with a realtime push service. That will be a significant cost in added complexity. I would like to put off that work.

I'm a solo bootstrapper. Working with Android and FCM has been daunting. Would you be interested in mentoring me a few hours a month? I can pay you. My email is in my profile.

There are some well-intended reasons for this, but unfortunately, they've resulted in many problems that don't have simple solutions.

To help increase battery life, both Apple and Google provide a standard notification backend that keeps a single background connection open for every device and routes notifications to the device. Google and Apple require the use of their systems for push notifications (APNS and FCM).

BUT... Google's services are blocked in mainland China. So, developers in that region have resorted to building their own push notification systems. No single standard has emerged, so apps may use several third-party FCM alternatives or entirely develop their own.

Inevitably if you live in China and you have hundreds of apps on your device, with each one waning to keep its own background connection open, your battery won't last more than a few hours.

As a result, OEMs in China modify the operating system on devices they sell to prevent apps from having background connections open. Users are expected to manually go into their settings and allow specific applications to keep connections open. Sometimes OEMs will have pre-set list of popular apps that are automatically included.

If one of these devices is sold outside of China, it sees FCM in the same way as any other background connection and kills it.

Now here's where it gets crazy:

1. Many devices sold on eBay and in other channels are devices that were intended to only be sold in China. The seller modifies the operating system to add Google's services to it and changes the default language to English. These devices tend to be particularly problematic because sellers intentionally mislabel them as international devices, even though they are still running a modified version of Android intended only for phones sold in China.

2. Many OEMs have chosen to keep some of their background connection killing logic even in devices sold outside of China. It's hard to say precisely why. Maybe they are competing for higher battery life, or perhaps they don't recognize the problems that it causes. This logic usually does some combination of either killing an app such that it can't wake up to receive a notification, or it interferes with the FCM connection to prevent any app from receiving notifications.

3. Building on #2 -- Many popular devices (e.g. modern Samsung phones) will ask users during the onboarding process if they want to "put apps to sleep after three days of not being used", and the option is "ON" by default. Users often don't realize that this setting will break apps that set background timers (e.g.: alarm apps) or that need to receive notifications.

4. The modifications made by these OEMs are often buggy. For example, user settings get reset after updates. Or the operating system seemingly randomly kills apps in such a way that prevents them from getting notifications.

5. Since OEMs often allow some popular apps to get around these restrictions, it creates an unfair situation between different companies. For example, Facebook might always work fine, but an up-and-coming social network will run into reliability issues.

6. In some cases, devices will appear to receive a push notification but will not display it to the end-user. This can be related to either power management preferences or OEM-specific "do not disturb" preferences.

This is a great website with more information on this issue: https://dontkillmyapp.com/

(Disclosure: I'm one of the founders of OneSignal. We've been tracking and mitigating this problem by implementing "Delivery Confirmation", a feature that permits apps to send a webhook whenever the app displays notification so that developers can see the end-to-end delivery of each message and either retry their messages or use a different method of contacting the user)

Thank you for this! It seems like many apps on the growth path go through this same investigation, since we arrived at many of the same conclusions you did (down to calling our own webhooks from within onMessageReceived to retry, or figure out which kinds of devices had the most problems). We spent over a year on this before we hit what I think is our maximum achievable delivery rate of around 85% (short of repeatedly hassling users to go deep into their settings or forcing them to give us an e-mail/phone number to use as backup).

Our app isn't even enabled in China but the points you make in 1 and 2 mean that plenty of devices in South America, Africa, Asia (and even some in North America and Europe) run into these problems.

I think (5) above is a serious unfairness problem since many conversations turn to "I didn't get your notification, let's talk on Whatsapp instead" which affects retention. Now 10-15% failed notification deliveries might not seem like much but if you're talking to 4-5 people either individually or in a group chat then one of them is bound to pull you into a more "reliable" app.

Incidentally we've considered actually reaching out to those top phone providers and ask to be put on the whitelist; not sure if you or anybody on here has tried anything similar.

We've tried contacting OEMs about some of the more buggy behaviors we've seen related to their power optimization features. Unsurprisingly, even when we did get ahold of someone in the US who was senior enough to understand the problem, they had limited ability to influence the China-based operations of the company. Getting whitelisted seems nearly impossible unless you are a large company with a major presence in China and connections to the OEMs.

Google has hinted that they are working on something to improve the situation, but it's been years without resolution and the problem appears to have become worse instead of better.

There's a lengthy thread in this old Android bug report about the issue https://issuetracker.google.com/issues/122098785

Thanks for the link. I noticed some of these OEMs had started implementing their own "more reliable" PushKits, but I am guessing that the privacy implications are quite bad so it's basically a non-starter.
Yes. Also, for example, Huawei is not permitted to use Google Services on their new devices. As a result, they leverage a replacement called HMS and they replaced FCM with the HMS "Push Kit".

Privacy or not, it's the only way to send notifications to new Huawei devices that don't have FCM like the Huawei P40 that are sold worldwide. This isn't a niche issue -- Huawei was recently the #1 manufacturer of smartphones outside of China, surpassing both Apple and Samsung. Although this ranking has fallen as consumers have become aware of the lack of Google Services on these devices.

#3 shouldn't break normal apps (excluding these chinese notification solutions), it's clearly documented restrictions apps should adhere to on https://developer.android.com/guide/components/services, and https://developer.android.com/about/versions/oreo/background, with WorkManager and JobScheduler examples one should use instead. Are you saying OEMs go even deeper and interfere with these APIs rather than just stopping the background service and the process as they are allowed to, and some times must due to memory shortage?

I wouldn't be surprised if OEMs had nasty tricks going on but the dontkillmyapp.com doesn't even mention the existence of these APIs and instead try to promote some auto-start apps on boot workaround believing that the service must be running? Giving me very little confidence in the rest of what they are saying. A lot of it has become more aggressive since each release lately so that could explain confusion for old app developers.

My understanding is that this Samsung device feature operates outside of Android's normal background execution limits.

Samsung's documentation on the feature is here: https://www.samsung.com/us/support/answer/ANS00088422/. Notably, the documentation fails to mention that this breaks notifications and timers (like Alarm Clock apps).

This is compounded with bugs on Samsung devices that prevent reliably notification delivery. Here's a thread on the Samsung forum about one of these bugs with 346 (!!) replies: https://eu.community.samsung.com/t5/galaxy-s9-series/delayed...

It seems hard to believe that Google and Samsung would allow such popular devices with these behaviors to be sold, but here we are.