My biggest issue with Devstral and even their biggest model is that they’re dangerous unless closely directed and reviewed and i mean CLOSELY. Unfortunately mistral models will believe and do anything.
FWIW personally i prefer this. When i tried Qwen3.6 and asked it a few questions, while it did respond, it was ADAMANT i should do something else when i really wanted an answer to the question i made. It felt like when you search something and a stackoverflow answer about what you search for comes up and the most upvoted answer is about using/doing something else - when you want a specific answer to that specific question, not something else.
Meanwhile Devstral Small 2 just answers the damn question.
I don't want to have to convince my computer to do what i want it to do, i want from it to do what i ask it to.
> It felt like when you search something and a stackoverflow answer about what you search for comes up and the most upvoted answer is about using/doing something else - when you want a specific answer to that specific question, not something else.
Don't you think there's usually a good reason for this? Whenever this happened to me, the problem was my ignorance.
I think there is a reason why people do that: trying to steer -those they consider- newbies away from patterns they consider bad, but at the same time this second-guessing can be annoying when you know what you want to do (especially when the original question isn't actually answered yet it comes up in search engine results...).
I can't say if it is a good reason in general, perhaps it is, but it certainly is something i personally find annoying. I think answers should provide an answer to the question asked and then, after that answer was given, they could also give pointers for whatever they consider a better approach and why - this is important, IMO, for a public forum where people of all backgrounds and goals can read the same stuff.
But either way, LLMs IMO should do/provide what they are asked without trying to second guess the user (or at least, there should be LLMs that act like that).
FWIW i haven't used Claude or any other cloud-based LLM, only what i can run on my PC, so it could be that Claude is smart enough to follow the user's instructions, keep the equivalent of a mental state of what the user seems to want to do and only push back when it really makes sense whereas a small local LLM is too stupid to judge all that and Qwen3.6 errs on the side of being annoyingly cautious while Devstral Small 2 errs on the side of trusting the user being really okay with blowing their toes off :-P. As i wrote in my original reply, this is my personal preference and i prefer the LLM to just do what i ask.
Meanwhile Devstral Small 2 just answers the damn question.
I don't want to have to convince my computer to do what i want it to do, i want from it to do what i ask it to.