Hacker News new | ask | show | jobs
by hansonw 1454 days ago
It appears the indexing for the model parts is deliberately not contiguous; the 03-82 range represents the main 80 transformer layers. https://github.com/yandex/YaLM-100B/blob/main/megatron_lm/me...
1 comments

That makes sense, thanks for clearing it up!