| > Here’s an exchange I had on twitter a few months ago: The purple account is just plain wrong. Classically, the full architecture is this (keeping in mind that all rules are sometimes broken): * CQRS is the linchpin. * You generally only queue commands (writes). A few hundreds of ms of latency on those typically won't be noticed by users. * Reads happen from either a read replica or cache. The problem the author faces are caused by cherry-picking bits of the full picture. A queue is a load smoothing operator. Things are going to go bad one way or another if you exceed capacity, a queue at least guarantees progress (up to a point). It's also a great metric to use to scale your worker count. > What will you do when your queue is full If your queue fills up you need to start rejecting requests. If you have a public facing API there's a good chance that there will be badly behaved clients that don't back off correctly - so you'll need a way to IP ban them until things calm down. AWS has API Gateway and Azure APIM that can help with this. If you're separating commands and queries you should _typically_ see more headroom. |
But even if you shifted reads to one or more caches or read replicas, wouldn't those also have queues that will fill up when you are under-provisioned?
Note that I'm using the term "queue" pretty loosely, to include things like Redis' maxclients or tcpbacklog, or client-side queues when all connections are in use.