|
|
|
|
|
by Yenrabbit
833 days ago
|
|
We've got a technical post with all the juicy details coming next week. But that bit refers to packing the 4-bit weights into a type FSDP is happy to shard (like float16 or float32) which matches the other non-quantized bits of the model. This way FSDP will happily wrap and shard all the parameters as if they were just normal floats. |
|