| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by n2d4 920 days ago
	Mixtral is on-par with Gemini Pro, not Gemini Ultra (and even there it is further behind Gemini Pro than Gemini Pro is behind GPT 3.5). But to directly answer your question, they are quite well-funded, having raised over $700mil to date. I definitely wouldn't count them out.

2 comments

coder543 920 days ago

Mixtral ranks higher than Gemini Pro on the (subjective) Chatbot Arena Leaderboard: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...

Where are you seeing that it is "further behind Gemini Pro than Gemini Pro is behind GPT 3.5"?

link

jsnell 920 days ago

Presumably in the very article this HN submission is for (https://arxiv.org/pdf/2312.11444.pdf), table 1.

link

coder543 920 days ago

Mixtral is missing in half of the benchmarks in that paper. Hardly conclusive. It’s also common knowledge that these benchmarks have a lot of issues[0]. A good litmus test, but not a substitute for actually seeing how the models do in the real world.

On the topic of “hardly conclusive” things, Gemini Pro literally told me just a few minutes ago[1] that the Avatar movies did not have humans in them. There was no funny business in the prompting. At least Mixtral knows that Avatar has humans in it. Most of Gemini Pro’s responses have been fine, but not exceptional.

[0]: one random article talking about these issues: https://www.surgehq.ai//blog/hellaswag-or-hellabad-36-of-thi...

[1]: https://i.imgur.com/En37EJD.png

link

euazOn 920 days ago

Gemini Ultra is not out yet. With the same logic, you could compare an unreleased Mistral model with Gemini Ultra.

link

n2d4 920 days ago

Right. I'm just pointing out that comparing one model with a distilled version of another and then making broad statements about the companies behind them isn't really useful.

Surely you could make a comparison of two unreleased models, but it wouldn't be interesting because you don't have any real data (and benchmarks don't really mean anything).

link

jstummbillig 920 days ago

Debating the usefulness of hn commentary is a somewhat philosophical issue, but I think it's entirely fair to draw parallels between what is, not what might be.

Gemini Ultra is self-evidently not ready for production. What the issues are? Who knows, but in a game that as of right now is mostly about reducing the amount of brute force required, something as "simple" as not being efficient enough is actually not something to gloss over. If your engines entire stick is having the greatest graphics but you can't make it run at acceptable fps, well, then it's not actually a usable product.

A LLM that is not actually released could very well be in a comparably dire state and fixing it while also delivering on the promised performance might be entirely non-trivial.

link

sp332 920 days ago

Mistral “Medium” is available (in beta, via API) and should give better results than the “Small” mixtral model.

link