Hacker News new | ask | show | jobs
by woadwarrior01 36 days ago
The Qwen 3.5, 3.6 and Kimi 2.5, 2.6 models also have multi-token prediction heads baked into their model weights.