Hacker News new | ask | show | jobs
by cabidaher 723 days ago
Multi-token prediction looks very interesting and quite elegant. It seems more efficient than predictive sampling.

From what I understand, they are essentially training it to have some form of representation of the context as a whole that is then used to generate the next n tokens, I feel like this is a nice next step towards "smarter" models. I wonder if a similar thing can be done for the inputs.

It's a shame they didn't compare it to llama3 since they had both a 6.7B and a 13B multi-token model. From what I could gather, the intruction-tuned llama3 is much better on HumanEval for example.