| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mdp2021 533 days ago

That should be proven. The two approaches - predicting tokens vs predicting "sentences" - should be compared to see how much their output differ in terms of quality.

Edit2: ...and both (and their variants) be compared to other ideas such as "multi-token prediction"...

Edit: or, appropriateness of the approach should be demonstrated after acquired "transparency" of how the LLMs effectively internally work. I am not aware of studies that make the inner workings of LLMs adequately clear.

Edit3: Substantially, the architecture should be as solid as possible (and results should reflect that).

1 comments

blackeyeblitzar 533 days ago

Isn’t “sentence prediction” roughly the same as multi token prediction of sufficient length? In the end are we just talking about a change to hyper parameters or maybe a new hyper parameter that controls the granularity of “prediction length”?

link

mdp2021 533 days ago

> multi token prediction of sufficient length

Is multi token prediction the same as predicting the embedding of a complex token (the articulation of those input tokens in a sentence)?

link

blackeyeblitzar 533 days ago

To be honest I don’t know. Maybe the only way to know is to build and measure all these variations.

link