|
|
|
|
|
by mdp2021
533 days ago
|
|
That should be proven. The two approaches - predicting tokens vs predicting "sentences" - should be compared to see how much their output differ in terms of quality. Edit2: ...and both (and their variants) be compared to other ideas such as "multi-token prediction"... Edit: or, appropriateness of the approach should be demonstrated after acquired "transparency" of how the LLMs effectively internally work. I am not aware of studies that make the inner workings of LLMs adequately clear. Edit3: Substantially, the architecture should be as solid as possible (and results should reflect that). |
|