Y
Hacker News
new
|
ask
|
show
|
jobs
by
ntonozzi
365 days ago
I just found a recent paper about this:
https://arxiv.org/abs/2505.15778
. It's really thoughtful and well written. They mix the different token outputs together.