That's one problem, another is the size of the powersupply. And maybe that's the only problem, I don't see why a GPU would become unstable when using fewer lanes, all it should do is get slower.
I just bought a GTX1080ti + a similar corsair as an upgrade for my 3 year old Dell, it works like a charm.
If you have a PSU that big then that probably isn't the problem. I thought you might be using the PSU that comes with those extender boxes and they usually are very puny (250 W or so).
Yes, each GPUs has a 4x -> 16x and a 4x-4x extender, in addition to the m.2 -> PCI-e 4x adapter.
So many potential failure points in there. The sole use case is CUDA. Essentially I wanted a portable cluster with GPUs and that did the work for a couple of month. Now it's getting more serious so the switch to T630 makes sense, and I repurposed the NUCs into the control plane of the K8s cluster.
Replicating is not very hard. You need a lightweight x86 machine for MAAS, which takes ~20min to install, one VLAN for the iDRAC (IPMI), another for networking that can connect to internet, and off you go. You can also enable KVM power management in MAAS to run the Juju control plane in VMs and save a box if you're limited in compute power.
The PSU is the Corsair AX1500i (1500W), with 10x lines for GPUs. It's robust on paper, didn't have any problem with just 4 plugged in.
But I must say... The T630 are very noisy compared to these, but so much more powerful #NotGoingBack