|
|
|
|
|
by jeffchao
695 days ago
|
|
This is very impressive, though an adjacent question — does anyone know roughly how much time and compute cost it takes to train something like the 405B? I would imagine with all the compute Meta has that the moat is incredibly large in terms of being able to train multiple 405B-level morels and compete. |
|
https://github.com/meta-llama/llama-models/blob/main/models/...