Hacker News new | ask | show | jobs
by kcarnold 519 days ago
This was the subject of https://arxiv.org/abs/2412.03719. (I suspect you can do simpler than the paper's solution if you're only interested in the top-k.)

A related topic is "token healing", although some implementations (unfortunately including the one in HuggingFace Transformers) make some big assumptions that aren't always true (like treating spaces as special).