Hacker News new | ask | show | jobs
by albertzeyer 731 days ago
The blog post seems to contain more details and the core ideas: https://medium.com/yandex/yafsdp-a-tool-for-faster-llm-train...
1 comments

Odd that they don’t expand on this:

In Yandex’s pre-trainings, the implementation of YaFSDP along with other memory optimization strategies resulted in a speed gain of 45%.