Hacker News new | ask | show | jobs
by ntonozzi 365 days ago
I just found a recent paper about this: https://arxiv.org/abs/2505.15778. It's really thoughtful and well written. They mix the different token outputs together.