The intrinsic limitation of text diffusion is that natural text contains serial dependencies where a word at the beginning of the text strongly influences what comes later, and if there is a long enough dependency chain within a diffusion block, the small number of diffusion steps may not be enough to resolve all dependencies, so that you end up with incoherent output.