|
|
|
|
|
by nravic
357 days ago
|
|
This is super interesting! We do something similar I think by taking a checkpoint after model initialization. I'm curious what you think about our approach, here's some benchmarks: https://docs.cedana.ai/articles/performance-of-cedanas-gpu-i... We do some on-the-fly optimizations as well (like compiling into CUDA graphs or fusing together calls) which ends up resulting (for some inference engines) faster token throughput too. |
|