|
|
|
|
|
by kamranjon
1113 days ago
|
|
I'd be really curious what the authors of the recent (3 days ago) paper on QLora would think of this article? https://arxiv.org/abs/2305.14314 - they claim "Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU" Particularly this statement seems relevant:
"We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT." |
|