Hacker News new | ask | show | jobs
by ma2rten 1919 days ago
It seems like Zero-3 doesn't work for them:

https://github.com/EleutherAI/gpt-neox/issues/171

3 comments

Hi! I’m the one who wrote this code. My ZeRO-3 implementation is currently not working, but I’ve spoken with DeepSpeed devs and they’ve explained to me what I’ve been doing wrong. I haven’t had time to implement the fix but I don’t see any reason to assume it won’t work.

https://github.com/microsoft/DeepSpeed/issues/846

Also, the specific problem described in that Issue was due to a bug I found in DeepSpeed that has since been corrected.

Looks like they got it working recently https://github.com/EleutherAI/gpt-neox/pull/178
Did you even read through the issue? I don't see anything that indicates it won't work.
Yes, I did. The last comment is a traceback and an explanation what would have to be done to fix it.
Your comment implied it's not possible at all for them to use it, not that it's currently not working.
I guess an ambiguity in the English language is at fault here. I meant it literally doesn't work for them, and not the idiom.