|
|
|
|
|
by nathanasmith
699 days ago
|
|
They also said in the paper that 405B was only trained to "compute-optimal" unlike the smaller models that were trained well past that point indicating the larger model still had some runway so had they continued it would have kept getting stronger. |
|