Hacker News new | ask | show | jobs
by CalmStorm 319 days ago
For the first time, it introduced native sparse attention into the full training process, achieving up to 11× inference speedup while maintaining model performance.