Hacker News new | ask | show | jobs
by elorant 840 days ago
Groq's Mixtral 8x7b nails this one though.

https://groq.com/

Sally has 1 sister. This may seem counterintuitive at first, but let's reason through it:

    We know that Sally has 3 brothers, and she is one of the sisters.
    Then we are told that each brother has 2 sisters.
    Since Sally's brothers share the same parents as Sally, they share the same sisters.
    Therefore, Sally's 3 brothers have only 1 additional sister besides Sally, making Sally's sister count 1.
It's a bit of a trick question, but it highlights the importance of understanding the phrasing and context in logical reasoning.
1 comments

If you change the names and numbers a bit, e.g. "Jake (a guy) has 6 sisters. Each sister has 3 brothers. How many brothers does Jake have?" it fails completely. Mixtral is not that good, it's just contaminated with this specific prompt.

In the same fashion lots of Mistral 7B fine tunes can solve the plate-on-banana prompt but most larger models can't, for the same reason.

https://arxiv.org/abs/2309.08632

Meanwhile, GPT4 nails it every time:

> Jake has 2 brothers. Each of his sisters has 3 brothers, including Jake, which means there are 3 brothers in total.

This is not Mistral 7b, it is Mixtral 7bx8 MoE. I use the Chrome extension Chathub, and i input the same prompts for code to Mixtral and ChatGPT. Most of the time they both get it right, but ChatGpt gets it wrong and Mixtral gets it right more often than you would expect.

That said, when i tried to put many models to explain some lisp code to me, the only model which figured out that the lisp function had a recursion in it, was Claude. Every other LLM failed to realize that.

I've tested with the Mixtral on LMSYS direct chat, gen params may vary a bit of course. In my experience running it locally it's been a lot more finicky to get it to work consistently compared to non-MoE models so I don't really keep it around anymore.

3.5-turbo's coding abilities are not that great, specialist 7B models like codeninja and deepseek coder match and sometimes outperform it.

There is also Mistral-next, which they claim that it has advanced reasoning abilities, better than ChatGPT-turbo. I want to use it at some point to test it. Have you tried Mistral-next? Is it no good?

You were talking about reasoning and i replied about coding, but coding requires some minimal level of reasoning. In my experience using both models to code, ChatGPT-turbo and Mixtral are both great.

>3.5-turbo's coding abilities are not that great, specialist 7B models like codeninja and deepseek coder match and sometimes outperform it.

Nice, i will keep these two in mind to use them.

I've tried Next on Lmsys and Le Chat, honestly I don't think it's much different than Small, and overall kinda meh I guess? Haven't really thrown any code at it though.

They say it's more "concise" whatever that's supposed to mean, I haven't noticed it being any more succinct than the others.