|
|
|
|
|
by notacoward
2757 days ago
|
|
Same thought here. I spent two-plus years debugging Lustre issues for a very small set of customers. It was an absolute beast. Build process was a compatibility-killing license-violating nightmare. Fell over at the slightest provocation, with little info to help figure out why. Provided no metrics to speak of, and the thicket of inter-related settings (especially timeouts) made effective tuning almost impossible. I'd guess that Amazon spent many engineer-years removing or rewriting significant pieces, and even more establishing the safe configuration envelope for what remained. Even then, it's probably a nightmare for the SREs (or whatever Amazon calls them) who have to keep it running. |
|