|
|
|
|
|
by sandGorgon
1212 days ago
|
|
>* 65B model's performance is broadly comparable to PALM-540B. Not a small feat, but also could indicate the benefits of good model-vs-token size ratios [Tables 3,4,5,6]. Their conjecture for underperforming on MMLU (multitask language understanding) compared to PALM-540B and Chinchilla-70B is smaller fraction of books and academic training data.* what do you mean by this ? The OpenAI papers talk roughly about model performance scaling by parameters. does this show the other way ? |
|