|
|
|
|
|
by vinkelhake
468 days ago
|
|
I don't get where the author is coming from with the idea that a diffusion based LLM would hallucinate less. > dLLMs can generate certain important portions first, validate it, and then continue the rest of the generation. If you pause the animation in the linked tweet (not the one on the page), you can see that the intermediate versions are full of, well, baloney. (and anyone who has messed around with diffusion based image generation knows the models are perfectly happy to hallucinate). |
|
However, autoregressive models that generate one token at a time are usually more accurate than parallel models that generate multiple tokens at a time.
In diffusion LLMs, both of these two effects interact. You can trade them off by determining how many tokens are generated at a time, and how many future tokens are used to predict the next set of tokens.