|
|
|
|
|
by psb217
640 days ago
|
|
The per token error of the non-AR model wrapped with MPC is no higher than the per token error of the non-AR model without MPC. Likelihood of the entire sequence being off the true data manifold is just one minus the product of the per token errors, whether or not you're running with the MPC loop. Ie, wrapping the non-AR model in an MPC loop and thereby converting it to an AR model (with a built-in planning mechanism) doesn't increase its probability of going off track. Per token error compounding over sequence length happens whether or not the model's autoregressive. The way in which per token errors correlate across a sequence might be more favorable wrt probability of producing bad sequences if you incorporate some explicit planning mechanism -- like the non-AR model wrapped in an MPC loop, but that's a more subtle argument than LeCun makes. |
|