|
|
|
|
|
by AlexCoventry
416 days ago
|
|
> it can adjust a whole block of tokens when it encounters some kind of disjunction. This is true in principle for general diffusion models, but I don't think it's true for the noise model they use in Mercury (at least, going by a couple of academic papers authored by the Inception co-founders.) Their model generates noise by masking a token, and once it's masked, it stays masked. So the reverse-diffusion gets to decide on the contents of a masked token once, and after that it's fixed. |
|
1. Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution - https://arxiv.org/abs/2310.16834
2. Simple and Effective Masked Diffusion Language Models - https://arxiv.org/abs/2406.07524