| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tmikaeld 59 days ago

My biggest issue with Devstral and even their biggest model is that they’re dangerous unless closely directed and reviewed and i mean CLOSELY. Unfortunately mistral models will believe and do anything.

See: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

See some of the test results, it’s horrifying

1 comments

badsectoracula 59 days ago

FWIW personally i prefer this. When i tried Qwen3.6 and asked it a few questions, while it did respond, it was ADAMANT i should do something else when i really wanted an answer to the question i made. It felt like when you search something and a stackoverflow answer about what you search for comes up and the most upvoted answer is about using/doing something else - when you want a specific answer to that specific question, not something else.

Meanwhile Devstral Small 2 just answers the damn question.

I don't want to have to convince my computer to do what i want it to do, i want from it to do what i ask it to.

link

tasuki 58 days ago

> It felt like when you search something and a stackoverflow answer about what you search for comes up and the most upvoted answer is about using/doing something else - when you want a specific answer to that specific question, not something else.

Don't you think there's usually a good reason for this? Whenever this happened to me, the problem was my ignorance.

link

badsectoracula 58 days ago

I think there is a reason why people do that: trying to steer -those they consider- newbies away from patterns they consider bad, but at the same time this second-guessing can be annoying when you know what you want to do (especially when the original question isn't actually answered yet it comes up in search engine results...).

I can't say if it is a good reason in general, perhaps it is, but it certainly is something i personally find annoying. I think answers should provide an answer to the question asked and then, after that answer was given, they could also give pointers for whatever they consider a better approach and why - this is important, IMO, for a public forum where people of all backgrounds and goals can read the same stuff.

But either way, LLMs IMO should do/provide what they are asked without trying to second guess the user (or at least, there should be LLMs that act like that).

link

tmikaeld 58 days ago

That’s my experience as well, if it’opus push back, it’s usually an actual issue with the code or prompt

link

badsectoracula 58 days ago

FWIW i haven't used Claude or any other cloud-based LLM, only what i can run on my PC, so it could be that Claude is smart enough to follow the user's instructions, keep the equivalent of a mental state of what the user seems to want to do and only push back when it really makes sense whereas a small local LLM is too stupid to judge all that and Qwen3.6 errs on the side of being annoyingly cautious while Devstral Small 2 errs on the side of trusting the user being really okay with blowing their toes off :-P. As i wrote in my original reply, this is my personal preference and i prefer the LLM to just do what i ask.

link