Hacker News new | ask | show | jobs
by justinclift 1061 days ago
Maybe there needs to be a better "burn in" test setup for their new hardware, just to catch mistakes in the build prep and/or catch bad hardware?
1 comments

Not that nothing will fail - but some manufacturers have just really good fault management, monitoring, alerting, etc. And even the simplest shit like SNMP with a few custom MIBs from the vendor (which theres some that do it better). Facilities and vendors that lend a good hand with remote hands is also nice, if you remote management infrastructure should fail. But out of band, full featured management cards with all the trimmings work so well. Some do good Redfish BMC/JSON/API stuff too on top of the usual SNMP and other nice builtin Easy Buttons. And today's tooling with bare metal and KVM, working around faults to be quite seamless. Even good NVME raid options if you just absolutely must have your local box with mirrored data protection, 10/40/100Gbps cards with a good libvirt setup to migrates large VMs in mere minutes, resuming on the remote end with nigh 1ms blip.
Good point. :)

I'm still wondering about their hardware acceptance/qualification though, prior to it being deployed. ;)

Yah presumably they put stuff through it's paces and give everything good fit and finish before running workloads. But failures do happen either way
Could you expand your answer to list vendors which you would recommend?
"it depends". Dell is fairly good overall, on-site techs are outsourced subcontractors a lot so that can be a mixed bag, pushy sales. Supermicro is good on a budget, not quite mature full fault management or complete SNMP or redfish, they can EOL a new line of gear suddenly.
Have you come across Fujitsu PRIMERGY servers before?

https://www.fujitsu.com/global/products/computing/servers/pr...

I used to use them a few years ago in a local data centre, and they were pretty good back then.

They don't seem to be widely known about though.

Have not - looks nice though. Around here, you'll mostly only encounter the Dell/Supermicro/HP/Lenovo. I actually find Dell to have acheived the lowest "friction" for deployments. You can get device manifests before the gear even ships, including MAC addresses, serials, out of band NIC MAC, etc. We pre-stage our configurations based on this, have everything ready to go (rack location/RU, switch ports, PDUs, DHCP/DNS). We literally just plug it all up and power on, and our tools take care of the rest without any intervention. Just verify the serial number of the server and stick it in the right rack unit, done.
> You can get device manifests before the gear even ships, including MAC addresses, serials, out of band NIC MAC, etc.

That does sound pretty useful.

So for yourselves, you rack them then run hardware qualification tests?