|
|
|
|
|
by throwaway427
544 days ago
|
|
This is a great presentation. Where I work we don't use any process level parallelism at the ruby level, we hoist that up to the kubernetes level and use signals (CPU load, job queue sizes, etc) to increase/decrease capacity. Workloads (replica sets) are segmented across multiple dimensions (different types of API traffic, worker queues) and are tuned for memory, cpu and thread counts according to their needs. Some heavy IO workloads can exceed a single cpu ever so slightly because db adapter isn't bound by the GVL, but practically speaking a pod/ruby process can only utilize 1 CPU, regardless of thread count. One downside of this approach though is it takes a long time for our app to boot and this along with time to provision new nodes can cause pod autoscalers to flap/overprovision if we don't periodically tune our workloads. In a perfect world we would be able to spawn processes/pods that are already warmed up/preloaded (similar to forking, but at the k8s level and the processes are detached from the root process) in a way that's not constrained by the CPU capacity of some underlying k8s node it is running on and instead is basically an infinite pool of CPUs that we only pay for what we use. Obviously serverless sort of offers this kind of solution if you squint but it is not a good fit for our architecture. |
|
Another is that you're leaving a lot of memory saving on the table by not benefiting from Copy on Write.