Y
Hacker News
new
|
ask
|
show
|
jobs
by
archeantus
423 days ago
“GPT‑4.1 scores 54.6% on SWE-bench Verified, improving by 21.4%abs over GPT‑4o and 26.6%abs over GPT‑4.5—making it a leading model for coding.”
4.1 is 26.6% better at coding than 4.5. Got it. Also…see the em dash
2 comments
pdabbadabba
423 days ago
What's wrong with the em-dash? That's just...the typographically correct dash AFAIK.
link
clbrmbr
423 days ago
Maybe a reference to the OpenAI models loving to output em-dashes?
link
drexlspivey
423 days ago
Should have named it 4.10
link
clbrmbr
423 days ago
But it’s so much weaker than 4.5 in broader tasks… maybe more optimized against benchmarks but it’s just no replacement for a huge model.
link