Hacker News new | ask | show | jobs
by pornel 434 days ago
Keep in mind that the "experts" are selected per layer, so it's not even a single expert selection you can correlate with a token, but an interplay of abstract features across many experts at many layers.