|
|
|
|
|
by cabidaher
723 days ago
|
|
Multi-token prediction looks very interesting and quite elegant. It seems more efficient than predictive sampling. From what I understand, they are essentially training it to have some form of representation of the context as a whole that is then used to generate the next n tokens, I feel like this is a nice next step towards "smarter" models. I wonder if a similar thing can be done for the inputs. It's a shame they didn't compare it to llama3 since they had both a 6.7B and a 13B multi-token model. From what I could gather, the intruction-tuned llama3 is much better on HumanEval for example. |
|