| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ehzb2827 169 days ago
	GLM 4.7 scores 41.0% on Terminal Bench 2.0 [1] compared to 58.4% for GPT-5.3-Codex-Spark [2]. [1] https://z.ai/blog/glm-4.7 [2] https://openai.com/index/introducing-gpt-5-3-codex-spark/

1 comments

shaklee3 168 days ago

Which is also bad compared to 5.3 codex. People don't seem to realize that this is not codex 5.3 quality. It's a large step down on the benchmarks to get lower latency.

link