|
|
|
|
|
by curious_cat_163
748 days ago
|
|
> a stateful load balancer that is aware of each server's available slots Interesting. Curious to understand what a 'slot' is, in this context. Is that a llama.cpp specific application-layer state that llama.cpp makes available? Or is this an application-layer state that is being inferred? If later, how? |
|