Hacker News new | ask | show | jobs
by tysam_and 1205 days ago
Megatron is a Large Language Model -- unfortunately it seems they really undertrained it for the parameter counts it had, so it was more a numbers game of "hey, look how big this model is!" when they first released it.

Many modern models are far more efficient for inference IIRC, though I guess it remains a good exercise in "how much can we fit through this silicon?" engineering. :D