Hacker News new | ask | show | jobs
by zozbot234 382 days ago
You could load different "experts" in a round-robin way on a single node and only aggregate "batches" opportunistically, when you just have multiple requests in-flight that all happen to rely on the same "expert". The difference being that instead of "batches", you would only really have queues. Of course this would come with a sizeable increase in latency, but that's acceptable for many applications (such as for "deep research" workflows)
1 comments

This is very much like Erlang's actor model. The same compute can be run in parallel, or managed via queues. With Erlang's strong support for FFI and process control, I wonder if it's being used as a dispatcher for these sorts of workloads.