|
|
|
|
|
by ekelsen
1023 days ago
|
|
Llama2 chat performs worse and wasn't included for that reason. The numbers are different because the measurement is different. The blog post explains that we sample from the models and expect answers rather than relying on perplexity measurements. |
|