|
|
|
|
|
by XenophileJKO
410 days ago
|
|
So my personal belief is that diffusion models will enable higher degrees of accuracy. This is because unlike an auto-regressive model it can adjust a whole block of tokens when it encounters some kind of disjunction. Think of the old example where an auto regressive model would output: "There are 2 possibilities.." before it really enumerated them. Often the model has trouble overcoming the bias and will hallucinate a response to fit the proceeding tokens. Chain of thought and other approaches help overcome this and other issues by incentivizing validation, etc. With diffusion however it is easier for the other generated answer to change that set of tokens to match the actual number of possibilities enumerated. This is why I think you'll see diffusion models be able to do some more advanced problem solving with a smaller number of "thinking" tokens. |
|