Hacker News new | ask | show | jobs
by dayeye2006 726 days ago
Any idea on what are the main tricks used to achieve gains over fsdp?
1 comments

The blog post seems to contain more details and the core ideas: https://medium.com/yandex/yafsdp-a-tool-for-faster-llm-train...
Odd that they don’t expand on this:

In Yandex’s pre-trainings, the implementation of YaFSDP along with other memory optimization strategies resulted in a speed gain of 45%.