Hi! I’m the one who wrote this code. My ZeRO-3 implementation is currently not working, but I’ve spoken with DeepSpeed devs and they’ve explained to me what I’ve been doing wrong. I haven’t had time to implement the fix but I don’t see any reason to assume it won’t work.
https://github.com/microsoft/DeepSpeed/issues/846
Also, the specific problem described in that Issue was due to a bug I found in DeepSpeed that has since been corrected.