Would love to hear more about the scale issues you saw. How many workflows or actions was too many? which components started breaking down, what were their failure modes?
See above. Its not so straightforward. You need enough headroom on each component that a negative feedback loop can start, eat resources, and have enough time and resources to calm itself before hitting some limit or degrading itself further