|
|
|
|
|
by byearthithatius
411 days ago
|
|
Interesting approach. However, I never thought of auto regression being _the_ current issue with language modeling. If anything it seems the community was generally surprised just how far next "token" prediction took us. Remember back when we did char generating RNNs and were impressed they could make almost coherent sentences? Diffusion is an alternative but I am having a hard time understanding the whole "built in error correction" that sounds like marketing BS. Both approaches replicate probability distributions which will be naturally error-prone because of variance. |
|
"Four X"
and
"Four X and seven years ago".
In the first case X could be pretty much anything, but in the second case we both know the only likely completion.
So it seems like there would be a huge advantage in not having to run autogressively. But in practice it's less significant then you might imagine because the AR model can internally model the probability of X conditioned on the stuff it hasn't output yet, and in fact because without reinforcement the training causes it converge on the target probability of the whole output, the AR model must do some form of lookahead internally.
(That said RLHF seems to break this product of the probabilities property pretty badly, so maybe it will be the case that diffusion will suffer less intelligence loss ::shrugs::).