| Massive props for getting it done anyway. For others reading: In general a switch should never run DHCPd, but will normally/often relay it for you, your arista's would 100% have supported relaying, but in this case it sounds like it might even be flat L2. Normally you'd host dhcpd on a server. Some general feedback incase it's helpful..
-20K on contractors seems insane if we're talking about rack and stack for 10 racks. Many datacentres can be persuaded to do it for free as part of you agreeing to sign their contract. Your contractors should at least be using a server lift of some kind, again often provided kindly by the facility. If this included paying for server configuration and so on, then ignore that comment (bargin!). -I would almost never expect to actually pay a setup fee (beyond something nominal like 500 per rack) to the datacentre either, certainly if you're going to be paying that fee it had better include rack and stack. -A crash cart should not be used for a install of this size, the servers should be plugged into the network, and then automatically configured by a script/IPXE. It might sound intimidating or hard but it's not, doesn't even require IMPI (though frankly I would strongly, strongly recommend it, if you do't already have it). I would use managed switches for the management network too, for sure. -Consider two switches, especially if they are second hand. The cost of the cluster not being usable for a few days while you source and install a replacement even here probably is still thousands. -Personally not a big fan of the whole JBOD architecture and would have just filled by boots with single socket 4u supermicro chasis. To each their own, but JBOD's main benefit is a very small financial saving at the cost of quite a lot of drawbacks IMO. YMMV. -Depending on who you use for GPUs, getting a private link or 'peering' to them might save you some cost and provide higher capacity. -I'm kind of shocked that FMT2 didn't turn out much cheaper than your current colo, would expect less than those figures possibly with the 100G DIA included (normally about $3000/month no setup). |
for IPXE do you have any reference material you'd recommend? we had 3 people each with reasonably substantial server experience try for like 6 hours each and for whatever reason it turned out to be too difficult.