Y
Hacker News
new
|
ask
|
show
|
jobs
by
_0ffh
146 days ago
Yup, as soon as data is the bottleneck and not compute, diffusion wins. Tested following the Chinchilla scaling strategy from 7M to 2.5B parameters.
https://arxiv.org/abs/2507.15857