|
|
|
|
|
by tysam_and
1205 days ago
|
|
Megatron is a Large Language Model -- unfortunately it seems they really undertrained it for the parameter counts it had, so it was more a numbers game of "hey, look how big this model is!" when they first released it. Many modern models are far more efficient for inference IIRC, though I guess it remains a good exercise in "how much can we fit through this silicon?" engineering. :D |
|