Y
Hacker News
new
|
ask
|
show
|
jobs
by
ehzb2827
122 days ago
GLM 4.7 scores 41.0% on Terminal Bench 2.0 [1] compared to 58.4% for GPT-5.3-Codex-Spark [2].
[1]
https://z.ai/blog/glm-4.7
[2]
https://openai.com/index/introducing-gpt-5-3-codex-spark/
1 comments
shaklee3
121 days ago
Which is also bad compared to 5.3 codex. People don't seem to realize that this is not codex 5.3 quality. It's a large step down on the benchmarks to get lower latency.
link