| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by oofbaroomf 399 days ago
	The SWE-Bench scores are very, very high for an open source model of this size. 46.8% is better than o3-mini (with Agentless-lite) and Claude 3.6 (with AutoCodeRover), but it is a little lower than Claude 3.6 with Anthropic's proprietary scaffold. And considering you can run this for almost free, this is a very extraordinary model.

3 comments

AstroBen 399 days ago

extraordinary.. or suspicious that the benchmarks aren't doing their job

link

echelon 399 days ago

I wasn't considering Mistral for anything, but this show of goodwill to open source is amazing. I'll have to give this a try.

link

qeternity 398 days ago

Mistral have a long history of open weight models...

link

alhimik45 397 days ago

But at the same time they don't open weights of Codestral...

link

sagarpatil 398 days ago

They are referring to SWE bench lite. Just want to make sure you are too.

link

svantana 398 days ago

Where did you get that idea? In the post they are repeatedly referring to SWEBench-Verified and nothing else.

link

sagarpatil 394 days ago

Sorry. I was wrong.

link

falcor84 399 days ago

Just to confirm, are you referring to Claude 3.7?

link

oofbaroomf 399 days ago

No. I am referring to Claude 3.5 Sonnet New, released October 22, 2024, with model ID claude-3-5-sonnet-20241022, colloquially referred to as Claude 3.6 Sonnet because of Anthropic's confusing naming.

link

ttoinou 399 days ago

And it is a very good LLM. Some people complain they don't see an improvement with Sonnet 3.7

link

Deathmax 399 days ago

Also known as Claude 3.5 Sonnet V2 on AWS Bedrock and GCP Vertex AI

link

SkyPuncher 399 days ago

> colloquially referred to as Claude 3.6

Interesting. I've never heard this.

link

simonw 399 days ago

It's the reason Anthropic called their next release 3.7 Sonnet - the 3.6 version number was already being used by some in the community to refer to their 3.5v2.

link

turing_complete 398 days ago

because nobody says that

link

NiloCK 398 days ago

Anthropic moved from 3.5, to 3.5(new), to 3.7. They skipped 3.6 because of usage in the community, and because 3.5(newer) probably passed some threshold of awfulness.

People also use 3.5.1 to refer to 3.5(new)/3.6.

The remaining difficulty now is when people refer to 3.5, without specifying (new) or (old). I find most unspecified references to 3.5 these days are actually to 3.6 / 3.5.1 / 3.5(new), which is confusing.

link

skerit 398 days ago

That's not correct. I have always referred to it as v3.6, and I've seen plenty of other people do so too. It's why their next model was called v3.7

link

moffkalast 398 days ago

The model formerly known as Claude 3.6 Sonnet?

link