Nice read. Just curious that isn't this handled via rate throttling at load-balancer/gateway level? Like only forward requests to database as much as it can handle?
I like the idea of priority queues driven by number of prior successful requests + wait time. Such that once you're in, you get reasonable performance, otherwise you get 503 "too busy" until you've waited.
See chapters 19 to 22: https://sre.google/sre-book/load-balancing-frontend/