| HN Mirror

The currently prevailing model in Rails is that a worker is a process that has an instance of your application running (ie the code loaded, global variables initialized etc). Let's say your service makes a request out to OpenAI and waits 5s for a model inference response. During that time, that particular worker that is servicing your request will sit around, needs to keep the 300MB or however big your heap is in memory and effectively does nothing while waiting for OpenAPI to respond. If you have 30GB of RAM to play this game, you'd be able to have 100 workers like this running and you'd be able to service a whooping 100/(5s per request) = 20 requests per second.

Contrast that to other architectures like NodeJS where there's an event loop driving execution that would suspend execution and work on the next request while waiting for OpenAPI to respond. This enables you to service thousands or tens or hundreds of thousands of the same kind of request with the same amount of RAM.

There are approaches to improve this in Ruby/Rails like Fibers, however lots of libraries in the ecosystem use global mutable state and assume it's request local. If you have multiple requests served concurrently by the same worker, they'll overwrite this state and bugs will happen. Also baking this onto the language is not very ergonomical (beautiful in Ruby speech) if you compare it to languages where concurrency has been a primary design concern in the beginning.