Y
Hacker News
new
|
ask
|
show
|
jobs
by
CalmStorm
319 days ago
For the first time, it introduced native sparse attention into the full training process, achieving up to 11× inference speedup while maintaining model performance.