| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CalmStorm 319 days ago
	For the first time, it introduced native sparse attention into the full training process, achieving up to 11× inference speedup while maintaining model performance.