|
|
|
|
|
by sbierwagen
929 days ago
|
|
If you don't like machine evaluations, you can take a look at the lmsys chatbot arena. You give a prompt, two chatbots answer anonymously, and you pick which answer is better: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar... On the human ratings, three different 7B LLMs (Two different Openchat models and a Mistral fine tune) beat a version of GPT-3.5. (The top 9 chatbots are GPT and Claude versions. Tenth place is a 70B model. While it's great that there's so much interest in 7B models, and it's incredible that people are pushing them so far, I selfishly wish more effort would go into 13B models... since those are the biggest that my macbook can run.) |
|