What size machines are these? I'm shocked that this volume is your max handling with Erlang unless your using a smaller T series AWS instance for this.
These are getting easily over 30,000 requests a second each about updating queues for new messages. And also are subscribed to presence events from our presence system to millions of people. It is a very busy service ensuring we only deliver messages to people not at their computer.
So if they are delivering 30k a second per box and the max "backlog" you allow to build is ~50k, then you cap your backlog at under 2sec worth of delivery? Or am I missing something?
These are getting easily over 30,000 requests a second each about updating queues for new messages. And also are subscribed to presence events from our presence system to millions of people. It is a very busy service ensuring we only deliver messages to people not at their computer.