| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kamranjon 1160 days ago
	I'd be really curious what the authors of the recent (3 days ago) paper on QLora would think of this article? https://arxiv.org/abs/2305.14314 - they claim "Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU" Particularly this statement seems relevant: "We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT."