Y
Hacker News
new
|
ask
|
show
|
jobs
by
ajtejankar
914 days ago
The base model has 32 layers and there is a single linear layer for language modeling (going from embeddings to the vocabulary) that gets applied at the very end.