| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Tiberium 891 days ago

Actually, there have been new model releases after LLaMA 2. For example, for small models Mistral 7B is simply unbeatable, with a lot of good fine-tunes available for it.

Usually people compare models with all the different benchmarks, but of course sometimes models get trained on benchmark datasets, so there's no true way of knowing except if you have a private benchmark or just try the model yourself.

I'd say that Mistral 7B is still short of gpt-3.5-turbo, but Mixtral 7x8B (the Mixture-of-Experts one) is comparable. You can try them all at https://chat.lmsys.org/ (choose Direct Chat, or Arena side-by-side)

ChatGPT is a web frontend - they use multiple models and switch them as they create new ones. Currently, the free ChatGPT version is running 3.5, but if you get ChatGPT Plus, you get (limited by messages/hour) access to 4, which is currently served with their GPT-4-Turbo model.

1 comments

mark_l_watson 891 days ago

I agree with your comments and want to add re: benchmarks: I don’t pay too much attention to benchmarks, but I have the advantage of now being retired so I can spend time experimenting with a variety of local models I run with Ollama and commercial offerings. I spend time to build my own, very subjective, views of what different models are good for. One kind of model analysis that I do like are the circle displays on Hugging Face that show how a model benchmarks for different capabilities (word problems, coding, etc.)