Hacker News new | ask | show | jobs
by x_flynn 394 days ago
I like to think of the intermediate tokens as low-dimensional hidden states. Also see the Coconut paper/citation