| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nravic 404 days ago
	This is super interesting! We do something similar I think by taking a checkpoint after model initialization. I'm curious what you think about our approach, here's some benchmarks: https://docs.cedana.ai/articles/performance-of-cedanas-gpu-i... We do some on-the-fly optimizations as well (like compiling into CUDA graphs or fusing together calls) which ends up resulting (for some inference engines) faster token throughput too.