Hacker News new | ask | show | jobs
by thatjoeoverthr 302 days ago
You can definitely "snap" it to the nearest neighbour according to the vocabulary matrix, but this comes with loss, so the "snapped" token won't behave the same. Not sure how it would score on benchmarks. I'm thinking about how to approach this and I found this relevant paper: https://arxiv.org/pdf/2302.03668 I'm hoping I can tie this back into prefix tokens.