Hacker News new | ask | show | jobs
by adityapatadia 636 days ago
Nice ideas, but we have chosen a really simple Kubernetes deployment. We only install the host OS (ubuntu server) and then join the self-hosted GPUs as workers in a Kubernetes cluster.

No other task is needed and our Grafana monitors if the server (and its containers) are up and running.

2 comments

Sorry, my suggestion was if you need to do training. If you're only serving then the suggestions I made aren't as valuable and something like what you've done probably make more sense. But you want a proper cluster setup to do multigpu and especially multi node stuff
> "Would you mind sharing the name of the data center?"

Curious to know what you use other than grafana in your monitoring stack. We use prometheus for metrics/alerts and Loki/promtail for logs.