Y
Hacker News
new
|
ask
|
show
|
jobs
by
woadwarrior01
36 days ago
The Qwen 3.5, 3.6 and Kimi 2.5, 2.6 models also have multi-token prediction heads baked into their model weights.