Hacker News new | ask | show | jobs
by ntonozzi 372 days ago
Is there any work related to using some kind of soft tokens for reasoning? It seems so inefficient to try to encode so much information down into a single token for the next pass of the model, when you could output a large vector for each forward pass, and have a drastically larger working memory/scratchpad, and have much higher bandwidth for the models to pass information forward to the next token call. If a single token has 17 bits of information, a vector of 1024 floats could have 32,768 bits of information.
1 comments

I just found a recent paper about this: https://arxiv.org/abs/2505.15778. It's really thoughtful and well written. They mix the different token outputs together.