Hacker News new | ask | show | jobs
by ydj 30 days ago
How do you determine that the model was reasoning in Chinese in layer X? I would think the middle layers do not map into any tokens.
1 comments