A simple way to do that in Elixir / Erlang is to just have each node spawn 20 processes that fetch work from redis (or a central coordinating process on another node)
Yeah, so the details of such is what I have not seen any examples of. I do plan to figure it out soon and then hopefully share it to benefit others. From what I understand thus far, I'll create a GenServer to connect to redis (so only using a single redis connection), pull jobs from redis and spawn Tasks on all nodes registered with the GenServer. And then I need to figure out how to limit the GenServer to a single instance per app on all nodes, or if too much hassle, just allow a GenServer per node and only spawn Tasks on the local node (so 1 redis connection per node).
A simple approach: Set up your apps' supervision tree such that you got poolboy [1] with a fixed pool of 20 worker processes and the one GenServer that distributes the work.
If you go with one GenServer per node, then that one just connects to redis and just pulls jobs (using BLPOP or whatever). When it has gotten a job it checks out a worker from poolboy and assigns it that work.
You could also just have every single worker process go directly to redis and have it pull jobs in a loop. But where's the fun in that ;)
If you want a single global coordinator instead of one per node you can use :global [2] to globally register a process in the cluster. This process is then cluster-wide reachable under its registered name. It can talk to each of your worker pools in the cluster and round-robin try to check out workers and assign them work. And if you do this you might as well ask yourself if you really need redis instead of keeping it all within your Elixir system.
Deciding on which node this process lives is still up to you, but there are libraries like locks [3] that allow you to automatically determine a leader in your cluster.
And once this is done you can start dealing with overload :)
Of course this is just a simple and naive approach, there are a lot of really useful Erlang libraries to check out. Here's a list of libs that helped me when getting started by reading their docs and sources in no particular order:
https://github.com/jlouis/safetyvalve - queueing facilities for tasks to be executed so their concurrency and rate can be limited on a running system
https://github.com/fishcakez/sbroker - process broker for matchmaking between two groups of processes using sojourn time based active queue management to prevent congestion.
https://github.com/ferd/backoff - exponential backoffs and timers to be used within OTP processes when dealing with cyclical events, such as reconnections, or generally retrying things