|
|
|
|
|
by MacsHeadroom
1110 days ago
|
|
30 and 40B parameter models regularly crush GPT-3 175B (Davinci) on every benchmark. GPT-3.5 is probably a 13B parameter model and it beats the original GPT-3 175B on most benchmarks (but not the more recent finetunes like 175B Davinci-003), so hyperparameters are clearly very important. |
|