Hacker News new | ask | show | jobs
by budududuroiu 483 days ago
Does anyone know if there’s a benefit to porting this to an orchestrator like K8s, maybe overkill for training but the KVCache might be useful when having multiple replicas for inference?