|
|
|
|
|
by blackeyeblitzar
531 days ago
|
|
Isn’t “sentence prediction” roughly the same as multi token prediction of sufficient length? In the end are we just talking about a change to hyper parameters or maybe a new hyper parameter that controls the granularity of “prediction length”? |
|
Is multi token prediction the same as predicting the embedding of a complex token (the articulation of those input tokens in a sentence)?