Y
Hacker News
new
|
ask
|
show
|
jobs
by
hansonw
1454 days ago
It appears the indexing for the model parts is deliberately not contiguous; the 03-82 range represents the main 80 transformer layers.
https://github.com/yandex/YaLM-100B/blob/main/megatron_lm/me...
1 comments
idealmedtech
1454 days ago
That makes sense, thanks for clearing it up!
link