|
|
|
|
|
by valine
388 days ago
|
|
The dimensionality I suppose depends on the vocab size and your hidden dimension size, but that’s not really relevant. It’s a single linear projection to go from latents to logits. Reasoning is definitely not happening in the linear projection to logits if that’s what you mean. |
|