| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by geeksinthewoods 194 days ago

The attention lottery framing feels especially timely now that DeepSeek's V3.2 tech report is out in the open. Seeing the actual top-k sparse routing and the post-training RL numbers spelled out makes the trade-offs concrete. Huge wins on speed and context, but every pruned token really is a quiet bet against the weird tail stuff that sometimes sparks real leaps...

What struck me most is how much DeepSeek's transparency accidentally lights up the closed models too. Long-context traces and million-token windows almost certainly lean on some variant of this under the hood. This article makes those black boxes feel a lot less mysterious. It leaves me both impressed by the engineering and quietly worried about the curiosity cost.

Also, the song / music video at the end is absurd in the best way!