Hacker News new | ask | show | jobs
by malthejorgensen 1990 days ago
How do you manage the bare metal cluster? (E.g. apt/yum updates but also networking and such)
3 comments

I'm a bit out of date but if we are talking about rendering (not data retrieval workloads) I believe the best way is fundamentally the same as it was 25 years ago: network boot, mostly network storage, and applying local config overlays based on MAC address or equivalents. Exactly what push or pull techniques are in vogue I am not sure but definitely no running package managers on each node. You want as little as possible locally -- just a scratchpad disk that can be rebuilt automatically in minutes.
When it was 3 nodes, and then 6 nodes, the answer was very unprofessionally. I didn't get the budget for a system administrator, and I spent all my budget on developers that could build our application and automate our preprocessing, overlooked system administration skills. So besides the DoE, managing 3 small teams and being the lead developer, I also am the system administrator.

So no fancy answer, our 3D experts got TeamViewer access to the nodes running Windows Pro. Sometimes our renders fail on patch Tuesday because I forgot to reapply the no-reboot hack.

We're professionalizing now at 12 nodes, we got to the point where the 3D experts don't need to TeamViewer in, so we're swapping them to headless Linux. No idea on the update management yet, but they're clean nodes running Ubuntu server.

Network solutions highly depends on the physical infrastructure, but for setup maintenance, you can often see SaltStack.